Reproducibility in the building science domain has long been hampered by a lack of general access to large, descriptive real-world datasets and open implementations of published analyses. Published papers rarely include code or data. The code that is made available mostly consists of brittle, hard-coded implementations that are specific to the simulation or building being analyzed.
We are in the process of developing Mortar, a platform for conducting reproducible building science research. Mortar consists of a large live dataset of over 100 buildings spanning over 26,000 data streams and over 10 billion data points, a descriptive ontology-based data model for each building (called "Brick"), and an API for accessing and manipulating the available data using queries against the data model. A goal of the project is to foster a self-sustaining research community that uses and contributes to the Mortar platform.
This tutorial will walk you through the process of building a portable building analytics application -- Rogue Zone Airflow Identification -- and executing it on several real buildings using the Mortar platform. This document is not a comprehensive introduction to Mortar and its capabilities, but the tutorial will link to external documentation and resources where you can learn more.
This tutorial requires
The tutorial is also dependent on the mybinder.org service for hosting remote Jupyter Notebooks. The "Start Notebook" button below tracks the status of the connection to the Jupyter notebook backend hosted on mybinder.org; if you lose your connection or the notebook seems unresponsive, the easiest way to fix is to reload the page and start again. Clicking the "Start Notebook" button again may also resolve the issue.
Click the button below to start!
First, make a free account on https://mortardata.org. This will give you access to the Mortar API.
Fill out the form and click "Sign Up". Completing account creation requires entering a code that is sent to the email you submitted on the form. After entering the code, you should see a message the the sign up was successful.
Please remember the username and password you choose! Enter them into the text box below and then execute the cell.
%matplotlib inline import pymortar import pandas as pd # removes extraneous output import logging logging.getLogger().setLevel("CRITICAL") logging.getLogger('matplotlib').setLevel("CRITICAL") client = pymortar.Client({ 'mortar_address': 'api.mortardata.org', # do not change 'username': "YOUR USERNAME HERE", # <------------- CHANGE THIS 'password': "YOUR PASSWORD HERE", # <------------- CHANGE THIS }) print("Success! Got client")
Here is a short dictionary of terms we will use in the tutorial:
Some common building terms and types of building equipment
Recall that a Brick model describes the elements and structure of a building and its subsystems and shows the relationships between them. Here you will use the Interactive Brick Query Builder to explore relationships between common building assets like thermostats and rooms.
Nodes represent Brick classes. Edges represent Brick relationships.
Click on nodes to expand them to reveal their relationships to other nodes. The query builder automatically constructs a SPARQL query from the nodes you have selected (indicated by a green highlight). Later, you will copy queries from the interface into a Jupyter notebook.
Navigate to the Interactive Brick Query Builder page and take a few minutes to explore.
Try to find connections between pairs of Brick classes; this should mirror how these types of things are connected in a real building.
Pick one class to start with (e.g. Floor
) and, by clicking and exploring, find a path to the other class (e.g. Room
).
The edge labels will tell you how each of these classes is related.
Floor
to Room
Thermostat
to Supply Air Temperature Heating Setpoint
VAV
to Damper Position
VAV
to Room
The Brick queries you have been generating use a query language called SPARQL. SPARQL queries have two parts -- SELECT
and WHERE
-- which consist of variables and terms.
The definitions below will refer to this SPARQL query
?
prefix (e.g. ?sensor
) represent nodes and edges that will be "filled in" by the query processor. The variables in our running example are ?tstat
and ?sensor
brick:Room
or bf:feeds
or mybuilding:Thermostat_123
) or with a variable.
Terms can be thought of as "patterns" matching parts of the graph representing the Brick model.
The following is one of three terms from our running example. It contains one variable.
SELECT
clause of a SPARQL query consists of a list of variables. This defines which variables will be returned by the evaluation of the query. Specifically, each row in the result contains a value for each variable in the SELECT
clause. The SELECT
clause below specifies two variables: ?tstat
and ?sensor
WHERE
clause of a SPARQL query contains all of the terms that restrict the output of the evaluated query. Each query result (a "row") is an assignment of entities in a Brick graph to variables in the query the matches the pattern defined by the terms in the WHERE
clause.
In order for applications to be portable, they need to be able to discover the configuration of a building and what data is available. This allows an application to configure itself to execute against many different buildings. Mortar applications use Brick queries to achieve this.
It is often useful to see how many sites will return results for a given Brick query; this can also be used to estimate how portable an application is.
Mortar provides a Qualify
API method that returns the list of sites that return results for each provided Brick query.This is useful for
Use a query you've already generated (or create a new query), paste it into the cell below, and execute the cell. Mortar will tell you which sites are qualified to run that query.
resp = client.qualify([""" PASTE QUERY HERE """]) print("{0} sites qualified".format(len(resp.sites)))
Mortar applications are usually organized into four phases:
"Rogue Zone Detection" is a common, simple fault-detection application that involves finding zones in a building whose temperature is consistently above or below the setpoint, or whose airflow is consistently below the setpoint (underventilated). This application also has the distinction of being the first application we will build together using Mortar!
We will construct the application according to the four phases above.
We will start with qualifying which sites are able to run our application. This requires us to author one or more Brick queries that capture the assets our application needs and the relationships between them. Using the Interactive Brick Query Builder, create a Brick query that relates a VAV
to a Supply Air Flow Sensor
and Supply Air Flow Setpoint
.
When you want to test if your query is correct, paste the query into the cell below and execute to see how many sites qualify (you should see 3 or 4).
When you are done, augment the query with relationships to the HVAC Zone
.
rogue_zone_query = """ PASTE QUERY HERE """ qualify_resp = client.qualify([rogue_zone_query]) print("{0} sites qualified for rogue zone".format(len(qualify_resp.sites)))
We will now use the generated Brick query to tell Mortar what timeseries data and metadata we want to download for the application. The Mortar API uses two structures for this purpose: Views and DataFrames.
For more detailed documentation, see the PyMortar documentation.
A View is a table of metadata defined by the execution of a Brick query against a set of sites.
The columns of the table consist of the variables in the SELECT
clause of the Brick query that defines
the table. Mortar adds the name of the site as an additional column to the table.
Execute the definition of the View for our application using the Brick query we defined in the Qualify
step.
The View itself has not been executed yet -- we are defining it so we can use it later in the Fetch Request.
rogue_view = pymortar.View( name="airflow", definition=rogue_zone_query, ) print("Defined View")
A DataFrame is a table of timeseries data. The temporal extent of a DataFrame is defined by a start and end time, an aggregation function, and a bucket size for the aggregation (e.g. "15 minute mean"). A DataFrame contains 1 or more streams of timeseries data identified by column; the name of the column is a unique identifier for the stream.
Below, we define two DataFrames for each of the collections of timeseries data the application needs. We define two different DataFrames because we want to aggregate setpoints differently than sensors: for setpoints we use the maximum setpoint in the aggregation window in order to avoid interpolating, but for sensors we use the average sensor reading in the aggregation window.
Each DataFrame has a name, an aggregation function (can be one of mean, min, max, count, sum or "raw" for no aggregation),
window size (argument to the aggregation function), and a list of timeseries.
The list of timeseries tells Mortar to pull data for each stream identified by the values of each data var
in the specified View.
In this example, the ?Air_Flow_Sensor
variable from our Brick query corresponds to a list of names of air flow sensors.
Execute the cell below to define the DataFrames we will use for our query.
sensor_df = pymortar.DataFrame( name="sensors", aggregation=pymortar.MEAN, window="30m", timeseries=[ pymortar.Timeseries( view="airflow", dataVars=["?SyArFwSr"], ) ] ) setpoint_df = pymortar.DataFrame( name="setpoints", aggregation=pymortar.MAX, window="30m", timeseries=[ pymortar.Timeseries( view="airflow", dataVars=["?SyArFwSt"], ) ] ) print("Defined DataFrames")
Now, lets combine the View and DataFrame definitions into a complete Fetch request. The full Fetch request ties together the sites
we retrieved from our Qualify
call, the View we defined, the DataFrames we defined, and the range of time we want to retrieve data for.
Execute the cell below to fetch data from the Mortar API (Note: this may take up to a minute)
request = pymortar.FetchRequest( sites=qualify_resp.sites, views=[rogue_view], dataFrames=[ sensor_df, setpoint_df, ], time=pymortar.TimeParams( start="2018-05-01T00:00:00Z", end="2018-09-01T00:00:00Z", ) ) print("Fetching data...") resp = client.fetch(request) print("Retrieved data:", resp)
If all went well, we should see that the response contains 1 View, 2 Dataframes and almost 2 million timeseries values!
We can access the contents of the Views and DataFrames we defined in the request through Fetch response object (resp
).
To get the list of DataFrames, use resp.dataFrames
resp.dataFrames
To access the contents of a DataFrame, use Python dict-style accesses e.g. resp['data frame name']
. The returned object will be a Pandas DataFrame with a DateTime index.
resp['sensors'].describe()
To get the list of Views, use resp.views
resp.views
To access the contents of a View as a Pandas DataFrame, use resp.view('view name')
resp.view('airflow').head()
You can also access the contents of a View as a SQL table. You can execute SQL queries against the tables using resp.query("SELECT * FROM airflow")
; the columns of the tables are given by resp.view_columns("view name")
print("Columns: ", resp.view_columns("airflow")) print("Sites: ", list(resp.query("SELECT distinct site from airflow")))
Because the Mortar testbed consists of data from real buildings, we need to perform some data cleaning to make sure our analysis behaves how we would expect. For our simple application, we will just remove sensor and setpoint values from our DataFrame that are either missing or 0.
Execute the following cell to create two new DataFrames containing our cleaned data
# find airflow sensors that aren't just all zeros valid_sensor_cols = (resp['sensors'] > 0).any().where(lambda x: x).dropna().index sensor_df = resp['sensors'][valid_sensor_cols] setpoint_df = resp['setpoints'] print("Finished cleaning")
For fun, we can plot some of the sensor/setpoint pairs of a few VAVs. The Python code below uses a Pandas groupby
function to grab the columns of data for each pair of air flow sensor/setpoint for each VAV, creates a new DataFrame, and plots it.
# CHANGE THIS to see more VAVs NUM_TO_PLOT=5 # use a groupby to grab the UUID values for each VAV and iterate through the resulting groups for r in resp.view('airflow').groupby('VV').agg({'SyArFwSt_uuid': list, 'SyArFwSr_uuid': list})[:NUM_TO_PLOT].iterrows(): sensor_uuid = r[1]['SyArFwSr_uuid'] setpoint_uuid = r[1]['SyArFwSt_uuid'] if sensor_uuid[0] not in sensor_df or setpoint_uuid[0] not in setpoint_df: continue # get the pandas Series for the sensor values sensor_values = sensor_df[sensor_uuid] # get the pandas Series for the setpoint values setpoint_values = setpoint_df[setpoint_uuid] # create a new DataFrame with these data columns df = pd.concat([sensor_values, setpoint_values], axis=1).dropna(axis=0) # change the column names from UUIDs to something human readable df.columns = ['air flow sensor','air flow setpoint'] # plot the data! fig = df.plot(figsize=(15,8)) fig.set_ylabel('cfm') fig.set_title(r[0])
Below is the Python code performing the Rogue Zone analysis. The analysis proceeds as follows:
Run the cell to see the result of running the application.
# get all the equipment we will run the analysis for. Equipment relates sensors and setpoints equipment = [r[0] for r in resp.query("select distinct VV from airflow")] print("Processing {0} VAVs...".format(len(equipment))) for idx, equip in enumerate(equipment): # for each equipment, pull the UUID for the sensor and setpoint q = """ SELECT SyArFwSr_uuid as sensor_uuid, SyArFwSt_uuid as setpoint_uuid, VV as equip, site FROM airflow WHERE VV = "{0}"; """.format(equip) res = resp.query(q) if len(res) == 0: continue sensor_col = res[0][0] setpoint_col = res[0][1] if sensor_col is None or setpoint_col is None: continue if sensor_col not in sensor_df or setpoint_col not in setpoint_df: continue # create the dataframe for this pair of sensor and setpoint df = pd.DataFrame([sensor_df[sensor_col], setpoint_df[setpoint_col]]).T df.columns = ['airflow','setpoint'] bad = (df.airflow + 10) < df.setpoint # by 10 cfm if len(df[bad]) == 0: continue df['same'] = bad.astype(int).diff(1).cumsum() # this increments every time we get a new run of sensor being below the setpoint # use this to group up those ranges df['same2'] = bad.astype(int).diff().ne(0).cumsum() lal = df[bad].groupby('same2')['same'] # grouped by ranges that meet the predicate (df.airflow + 10 < df.setpoint) for g in lal.groups: idx = list(lal.groups[g]) if len(idx) < 2: continue data = df[idx[0]:idx[-1]] if len(data) >= 24: # 12 hours fmt = { 'site': res[0][3], 'equip': equip, 'hours': len(data) / 2, 'start': idx[0], 'end': idx[-1], 'diff': (data['setpoint'] - data['airflow']).mean(), } print("Low Airflow for {hours} hours From {start} to {end}, avg diff {diff:.2f} CFM".format(**fmt)) print("Finished!")
You have now successfully built and executed a rogue zone application for airflow. You will now construct a similar application that analyzes temperature sensors and setpoints. Using the Interactive Brick Query Builder, construct a Brick query that retrieves HVAC zones, their related thermostat, and the thermostat's temperature sensors and setpoints. Paste this query below and execute the cell
rogue_zone_temperature_query = """ PASTE QUERY HERE """ qualify_resp = client.qualify([rogue_zone_temperature_query]) print("{0} sites qualified".format(len(qualify_resp.sites)))
Using the Brick query you just generated, complete the following definitions for the View and DataFrames for our new application.
rogue_temp_view = pymortar.View( name= # FILL IN name of your choice, definition= # FILL IN Brick query you created above ) print("Defined View") sensor_df = pymortar.DataFrame( name= "sensors", aggregation= pymortar.MEAN, window= "1h", timeseries=[ pymortar.Timeseries( view= # FILL IN name of view, dataVars=[ "FILL IN variable related to sensors (remember '?' prefix)" ], ) ] ) setpoint_df = pymortar.DataFrame( name= "setpoints", aggregation= pymortar.MAX, window= "1h", timeseries=[ pymortar.Timeseries( view= # FILL IN name of view, dataVars=[ "FILL IN variables related to setpoints (remember '?' prefix)" ], ) ] ) print("Defined DataFrames")
Try it out! Try running the Fetch request using the View and DataFrames you just defined. (Note: this may take up to a minute)
request = pymortar.FetchRequest( sites=qualify_resp.sites, views=[rogue_temp_view], dataFrames=[ sensor_df, setpoint_df, ], time=pymortar.TimeParams( start="2018-07-01T00:00:00Z", end="2018-09-01T00:00:00Z", ) ) print("Fetching data...") resp = client.fetch(request) print("Retrieved data:", resp)
If all went well, then you should be able to run the next cell to get a nice plot of temperatures and setpoints for each thermostat
NUM_TO_PLOT=5 df = resp.view(resp.views[0]) for thermostat in df['Tt'].unique()[:NUM_TO_PLOT]: d = df[df.Tt == thermostat][['SyArTeHgSt_uuid','SyArTeCgSt_uuid','TeSr_uuid']] hsp = d.values[0][0] csp = d.values[0][1] temp = d.values[0][2] if hsp not in resp['setpoints'] or csp not in resp['setpoints'] or temp not in resp['sensors']: continue data = pd.concat([resp['setpoints'][hsp], resp['setpoints'][csp], resp['sensors'][temp]], axis=1) data.columns = ['hsp','csp','temp'] fig = data.plot(figsize=(15,8)) fig.set_ylabel('Celsius') fig.set_title(thermostat + ' Zone')
Thanks for completing the Mortar tutorial!