Mortar tutorial

Tutorial Introduction

Reproducibility in the building science domain has long been hampered by a lack of general access to large, descriptive real-world datasets and open implementations of published analyses. Published papers rarely include code or data. The code that is made available mostly consists of brittle, hard-coded implementations that are specific to the simulation or building being analyzed.

We are in the process of developing Mortar, a platform for conducting reproducible building science research. Mortar consists of a large live dataset of over 100 buildings spanning over 26,000 data streams and over 10 billion data points, a descriptive ontology-based data model for each building (called "Brick"), and an API for accessing and manipulating the available data using queries against the data model. A goal of the project is to foster a self-sustaining research community that uses and contributes to the Mortar platform.

This tutorial will walk you through the process of building a portable building analytics application -- Rogue Zone Airflow Identification -- and executing it on several real buildings using the Mortar platform. This document is not a comprehensive introduction to Mortar and its capabilities, but the tutorial will link to external documentation and resources where you can learn more.

Technical Requirements

This tutorial requires

  • A modern browser (the tutorial has been tested on Firefox and Chrome) with javascript enabled
  • A working internet connection (but no fancy ports -- just 443 and 80)
  • Base-level knowledge of Python

The tutorial is also dependent on the mybinder.org service for hosting remote Jupyter Notebooks. The "Start Notebook" button below tracks the status of the connection to the Jupyter notebook backend hosted on mybinder.org; if you lose your connection or the notebook seems unresponsive, the easiest way to fix is to reload the page and start again. Clicking the "Start Notebook" button again may also resolve the issue.

Click the button below to start!

Make a Mortar Account

First, make a free account on https://mortardata.org. This will give you access to the Mortar API.

Navigate to mortardata.org and click 'Sign Up'

Fill out the form and click "Sign Up". Completing account creation requires entering a code that is sent to the email you submitted on the form. After entering the code, you should see a message the the sign up was successful.

Fill out the form

Please remember the username and password you choose! Enter them into the text box below and then execute the cell.

%matplotlib inline
import pymortar
import pandas as pd

# removes extraneous output
import logging
logging.getLogger().setLevel("CRITICAL")
logging.getLogger('matplotlib').setLevel("CRITICAL")

client = pymortar.Client({
    'mortar_address': 'api.mortardata.org', # do not change
    'username': "YOUR USERNAME HERE",  # <------------- CHANGE THIS
    'password': "YOUR PASSWORD HERE",  # <------------- CHANGE THIS
})
print("Success! Got client")
            

Aside: Useful Terms

Here is a short dictionary of terms we will use in the tutorial:

Some common building terms and types of building equipment

Exploring Brick Models Interactively

Recall that a Brick model describes the elements and structure of a building and its subsystems and shows the relationships between them. Here you will use the Interactive Brick Query Builder to explore relationships between common building assets like thermostats and rooms.

Nodes represent Brick classes. Edges represent Brick relationships.

Click on nodes to expand them to reveal their relationships to other nodes. The query builder automatically constructs a SPARQL query from the nodes you have selected (indicated by a green highlight). Later, you will copy queries from the interface into a Jupyter notebook.

Navigate to the Interactive Brick Query Builder page and take a few minutes to explore.

Try to find connections between pairs of Brick classes; this should mirror how these types of things are connected in a real building. Pick one class to start with (e.g. Floor) and, by clicking and exploring, find a path to the other class (e.g. Room). The edge labels will tell you how each of these classes is related.

Find a path from Floor to Room
Find a path from Thermostat to Supply Air Temperature Heating Setpoint
Find a path from VAV to Damper Position
Find a path from VAV to Room
HodDB Query Builder

Check In

Aside: Understanding SPARQL Queries

The Brick queries you have been generating use a query language called SPARQL. SPARQL queries have two parts -- SELECT and WHERE -- which consist of variables and terms.

Full example query

The definitions below will refer to this SPARQL query

Variables, identified by their ? prefix (e.g. ?sensor) represent nodes and edges that will be "filled in" by the query processor. The variables in our running example are ?tstat and ?sensor
A term is a sequence representing two nodes and the directed edge between them. A term can refer to nodes and edges by name (e.g. brick:Room or bf:feeds or mybuilding:Thermostat_123) or with a variable. Terms can be thought of as "patterns" matching parts of the graph representing the Brick model. The following is one of three terms from our running example. It contains one variable.
Term 1
The SELECT clause of a SPARQL query consists of a list of variables. This defines which variables will be returned by the evaluation of the query. Specifically, each row in the result contains a value for each variable in the SELECT clause. The SELECT clause below specifies two variables: ?tstat and ?sensor
Select clause
The WHERE clause of a SPARQL query contains all of the terms that restrict the output of the evaluated query. Each query result (a "row") is an assignment of entities in a Brick graph to variables in the query the matches the pattern defined by the terms in the WHERE clause.

Retrieving Qualified Sites

In order for applications to be portable, they need to be able to discover the configuration of a building and what data is available. This allows an application to configure itself to execute against many different buildings. Mortar applications use Brick queries to achieve this.

It is often useful to see how many sites will return results for a given Brick query; this can also be used to estimate how portable an application is. Mortar provides a Qualify API method that returns the list of sites that return results for each provided Brick query.This is useful for

Use a query you've already generated (or create a new query), paste it into the cell below, and execute the cell. Mortar will tell you which sites are qualified to run that query.

resp = client.qualify(["""
    PASTE QUERY HERE
"""])
print("{0} sites qualified".format(len(resp.sites)))
            

Anatomy of a Mortar Application

Mortar applications are usually organized into four phases:

Qualify: the application retrieves the list of sites against which it will execute
Fetch: the application retrieves the timeseries data and metadata it needs to execute
Clean: the application performs any necessary data cleaning and filtering
Analyze: the application executes its analysis against a clean dataset
TODO: replace this with a simpler image Application anatomy

Rogue Zone "Airflow" Application: Qualify

"Rogue Zone Detection" is a common, simple fault-detection application that involves finding zones in a building whose temperature is consistently above or below the setpoint, or whose airflow is consistently below the setpoint (underventilated). This application also has the distinction of being the first application we will build together using Mortar!

We will construct the application according to the four phases above. We will start with qualifying which sites are able to run our application. This requires us to author one or more Brick queries that capture the assets our application needs and the relationships between them. Using the Interactive Brick Query Builder, create a Brick query that relates a VAV to a Supply Air Flow Sensor and Supply Air Flow Setpoint. When you want to test if your query is correct, paste the query into the cell below and execute to see how many sites qualify (you should see 3 or 4).

When you are done, augment the query with relationships to the HVAC Zone.

rogue_zone_query = """
    PASTE QUERY HERE
"""
qualify_resp = client.qualify([rogue_zone_query])
print("{0} sites qualified for rogue zone".format(len(qualify_resp.sites)))
            

Rogue Zone "Airflow" Application: Fetch

We will now use the generated Brick query to tell Mortar what timeseries data and metadata we want to download for the application. The Mortar API uses two structures for this purpose: Views and DataFrames.

For more detailed documentation, see the PyMortar documentation.

View

View definition + eval

A View is a table of metadata defined by the execution of a Brick query against a set of sites. The columns of the table consist of the variables in the SELECT clause of the Brick query that defines the table. Mortar adds the name of the site as an additional column to the table.

Execute the definition of the View for our application using the Brick query we defined in the Qualify step. The View itself has not been executed yet -- we are defining it so we can use it later in the Fetch Request.

rogue_view = pymortar.View(
    name="airflow",
    definition=rogue_zone_query,
)
print("Defined View")
            

DataFrame

DataFrame definition + eval

A DataFrame is a table of timeseries data. The temporal extent of a DataFrame is defined by a start and end time, an aggregation function, and a bucket size for the aggregation (e.g. "15 minute mean"). A DataFrame contains 1 or more streams of timeseries data identified by column; the name of the column is a unique identifier for the stream.

Below, we define two DataFrames for each of the collections of timeseries data the application needs. We define two different DataFrames because we want to aggregate setpoints differently than sensors: for setpoints we use the maximum setpoint in the aggregation window in order to avoid interpolating, but for sensors we use the average sensor reading in the aggregation window.

Each DataFrame has a name, an aggregation function (can be one of mean, min, max, count, sum or "raw" for no aggregation), window size (argument to the aggregation function), and a list of timeseries. The list of timeseries tells Mortar to pull data for each stream identified by the values of each data var in the specified View. In this example, the ?Air_Flow_Sensor variable from our Brick query corresponds to a list of names of air flow sensors.

Execute the cell below to define the DataFrames we will use for our query.

sensor_df = pymortar.DataFrame(
	name="sensors",
	aggregation=pymortar.MEAN,
	window="30m",
	timeseries=[
		pymortar.Timeseries(
			view="airflow",
			dataVars=["?SyArFwSr"],
		)
	]
)

setpoint_df = pymortar.DataFrame(
	name="setpoints",
	aggregation=pymortar.MAX,
	window="30m",
	timeseries=[
		pymortar.Timeseries(
			view="airflow",
			dataVars=["?SyArFwSt"],
		)
	]
)
print("Defined DataFrames")
            

Fetch Request

Now, lets combine the View and DataFrame definitions into a complete Fetch request. The full Fetch request ties together the sites we retrieved from our Qualify call, the View we defined, the DataFrames we defined, and the range of time we want to retrieve data for. Execute the cell below to fetch data from the Mortar API (Note: this may take up to a minute)

request = pymortar.FetchRequest(
    sites=qualify_resp.sites,
    views=[rogue_view],
    dataFrames=[
		sensor_df,
		setpoint_df,
    ],
    time=pymortar.TimeParams(
        start="2018-05-01T00:00:00Z",
        end="2018-09-01T00:00:00Z",
    )
)

print("Fetching data...")
resp = client.fetch(request)
print("Retrieved data:", resp)
            

If all went well, we should see that the response contains 1 View, 2 Dataframes and almost 2 million timeseries values!

Aside: Exploring a Fetch Response

We can access the contents of the Views and DataFrames we defined in the request through Fetch response object (resp).

To get the list of DataFrames, use resp.dataFrames

resp.dataFrames
            

To access the contents of a DataFrame, use Python dict-style accesses e.g. resp['data frame name']. The returned object will be a Pandas DataFrame with a DateTime index.

resp['sensors'].describe()
            

To get the list of Views, use resp.views

resp.views
            

To access the contents of a View as a Pandas DataFrame, use resp.view('view name')

resp.view('airflow').head()
            

You can also access the contents of a View as a SQL table. You can execute SQL queries against the tables using resp.query("SELECT * FROM airflow"); the columns of the tables are given by resp.view_columns("view name")

print("Columns: ", resp.view_columns("airflow"))
print("Sites: ", list(resp.query("SELECT distinct site from airflow")))
            

Check In

Rogue Zone "Airflow" Application: Clean

Because the Mortar testbed consists of data from real buildings, we need to perform some data cleaning to make sure our analysis behaves how we would expect. For our simple application, we will just remove sensor and setpoint values from our DataFrame that are either missing or 0.

Execute the following cell to create two new DataFrames containing our cleaned data

# find airflow sensors that aren't just all zeros
valid_sensor_cols = (resp['sensors'] > 0).any().where(lambda x: x).dropna().index
sensor_df = resp['sensors'][valid_sensor_cols]
setpoint_df = resp['setpoints']
print("Finished cleaning")
            

For fun, we can plot some of the sensor/setpoint pairs of a few VAVs. The Python code below uses a Pandas groupby function to grab the columns of data for each pair of air flow sensor/setpoint for each VAV, creates a new DataFrame, and plots it.

# CHANGE THIS to see more VAVs
NUM_TO_PLOT=5

# use a groupby to grab the UUID values for each VAV and iterate through the resulting groups
for r in resp.view('airflow').groupby('VV').agg({'SyArFwSt_uuid': list, 'SyArFwSr_uuid': list})[:NUM_TO_PLOT].iterrows():
  sensor_uuid = r[1]['SyArFwSr_uuid']
  setpoint_uuid = r[1]['SyArFwSt_uuid']
  if sensor_uuid[0] not in sensor_df or setpoint_uuid[0] not in setpoint_df: continue
  # get the pandas Series for the sensor values
  sensor_values = sensor_df[sensor_uuid]
  # get the pandas Series for the setpoint values
  setpoint_values = setpoint_df[setpoint_uuid]
  # create a new DataFrame with these data columns
  df = pd.concat([sensor_values, setpoint_values], axis=1).dropna(axis=0)
  # change the column names from UUIDs to something human readable
  df.columns = ['air flow sensor','air flow setpoint']
  # plot the data!
  fig = df.plot(figsize=(15,8))
  fig.set_ylabel('cfm')
  fig.set_title(r[0])
            

Rogue Zone "Airflow" Application: Analyze

Below is the Python code performing the Rogue Zone analysis. The analysis proceeds as follows:

Get the list of unique equipment by running a SQL query on the View
Run a SQL query to get the sensor and setpoint for each piece of equipment
Find ranges of time where the sensor was at least 10 cfm below the setpoint that are at least 2 hours long

Run the cell to see the result of running the application.

# get all the equipment we will run the analysis for. Equipment relates sensors and setpoints
equipment = [r[0] for r in resp.query("select distinct VV from airflow")]
print("Processing {0} VAVs...".format(len(equipment)))

for idx, equip in enumerate(equipment):
    # for each equipment, pull the UUID for the sensor and setpoint
    q = """
    SELECT SyArFwSr_uuid as sensor_uuid, SyArFwSt_uuid as setpoint_uuid, VV as equip, site
    FROM airflow
    WHERE VV = "{0}";
    """.format(equip)
    res = resp.query(q)
    if len(res) == 0: continue

    sensor_col = res[0][0]
    setpoint_col = res[0][1]
    if sensor_col is None or setpoint_col is None: continue
    if sensor_col not in sensor_df or setpoint_col not in setpoint_df: continue

    # create the dataframe for this pair of sensor and setpoint
    df = pd.DataFrame([sensor_df[sensor_col], setpoint_df[setpoint_col]]).T
    df.columns = ['airflow','setpoint']
    bad = (df.airflow + 10) < df.setpoint # by 10 cfm
    if len(df[bad]) == 0: continue
    df['same'] = bad.astype(int).diff(1).cumsum()
    # this increments every time we get a new run of sensor being below the setpoint
    # use this to group up those ranges
    df['same2'] = bad.astype(int).diff().ne(0).cumsum()

    lal = df[bad].groupby('same2')['same']
    # grouped by ranges that meet the predicate (df.airflow + 10 < df.setpoint)
    for g in lal.groups:
        idx = list(lal.groups[g])
        if len(idx) < 2: continue
        data = df[idx[0]:idx[-1]]
        if len(data) >= 24: # 12 hours
            fmt = {
                'site': res[0][3],
                'equip': equip,
                'hours': len(data) / 2,
                'start': idx[0],
                'end': idx[-1],
                'diff': (data['setpoint'] - data['airflow']).mean(),
            }
            print("Low Airflow for {hours} hours From {start} to {end}, avg diff {diff:.2f} CFM".format(**fmt))
print("Finished!")
            

Developing Rogue Zone "Temperature" Application: Qualify

You have now successfully built and executed a rogue zone application for airflow. You will now construct a similar application that analyzes temperature sensors and setpoints. Using the Interactive Brick Query Builder, construct a Brick query that retrieves HVAC zones, their related thermostat, and the thermostat's temperature sensors and setpoints. Paste this query below and execute the cell

rogue_zone_temperature_query = """
    PASTE QUERY HERE
"""
qualify_resp = client.qualify([rogue_zone_temperature_query])
print("{0} sites qualified".format(len(qualify_resp.sites)))
            

Developing Rogue Zone "Temperature" Application: Fetch

Using the Brick query you just generated, complete the following definitions for the View and DataFrames for our new application.

rogue_temp_view = pymortar.View(
    name= # FILL IN name of your choice,
    definition= #  FILL IN Brick query you created above
)
print("Defined View")

sensor_df = pymortar.DataFrame(
	name= "sensors",
	aggregation= pymortar.MEAN,
	window= "1h",
	timeseries=[
		pymortar.Timeseries(
			view= # FILL IN name of view,
			dataVars=[ "FILL IN variable related to sensors (remember '?' prefix)" ],
		)
	]
)

setpoint_df = pymortar.DataFrame(
	name= "setpoints",
	aggregation= pymortar.MAX,
	window= "1h",
	timeseries=[
		pymortar.Timeseries(
			view= # FILL IN name of view,
			dataVars=[ "FILL IN variables related to setpoints (remember '?' prefix)" ],
		)
	]
)
print("Defined DataFrames")
            

Try it out! Try running the Fetch request using the View and DataFrames you just defined. (Note: this may take up to a minute)

request = pymortar.FetchRequest(
    sites=qualify_resp.sites,
    views=[rogue_temp_view],
    dataFrames=[
		sensor_df,
		setpoint_df,
    ],
    time=pymortar.TimeParams(
        start="2018-07-01T00:00:00Z",
        end="2018-09-01T00:00:00Z",
    )
)

print("Fetching data...")
resp = client.fetch(request)
print("Retrieved data:", resp)
            

If all went well, then you should be able to run the next cell to get a nice plot of temperatures and setpoints for each thermostat

NUM_TO_PLOT=5
df = resp.view(resp.views[0])
for thermostat in df['Tt'].unique()[:NUM_TO_PLOT]:
  d = df[df.Tt == thermostat][['SyArTeHgSt_uuid','SyArTeCgSt_uuid','TeSr_uuid']]
  hsp = d.values[0][0]
  csp = d.values[0][1]
  temp = d.values[0][2]
  if hsp not in resp['setpoints'] or csp not in resp['setpoints'] or temp not in resp['sensors']: continue
  data = pd.concat([resp['setpoints'][hsp], resp['setpoints'][csp], resp['sensors'][temp]], axis=1)
  data.columns = ['hsp','csp','temp']
  fig = data.plot(figsize=(15,8))
  fig.set_ylabel('Celsius')
  fig.set_title(thermostat + ' Zone')
            

Check In

Thanks for completing the Mortar tutorial!

Submitting Response: