Overview
We believe in the vision of smart buildings – reactive homes and workplaces that prioritize their
occupants’ comfort and wellbeing, creating safe and healthy environments that not only boost happiness
and productivity, but are also more maintainable and environmentally sustainable.
To achieve this vision, buildings will need to augment their conventional suite of sensors
(smoke detectors, thermostats, motion sensors, badge access points, elevator load, etc.) with more
advanced capabilities, able to detect fine-grained building states and occupant activities.
This raises a significant sensing paradox and research challenge: how does one create a building
that knows its fine-grained state and what its occupants are doing so as to be smart and responsive,
while at the same time protecting occupant privacy?
To investigate this important and timely topic, we have created a unique research system at
Carnegie Mellon University called Mites. This is an end-to-end, hardware-software system for
supporting and managing distributed general-purpose sensing in buildings with ground-up
fundamental primitives for privacy and security, scalable data management, and
machine learning.
System Design
We built an end-to-end full-stack system with the goal of providing high-fidelity
sensing of various ambient environmental facets in physical spaces in a building,
ultimately enabling a diverse set of IoT applications.
Our Mites sensor package, comprising nine sensors that capture twelve multimodal sensing
modalities (e.g., temperature, humidity, light, movement, and audio), is installed in the entire
building, either in the ceiling or in the wall or plugged into a powered wall socket.
Our Mites devices acquire data from the onboard sensors, perform signal processing, and extract
different features, all on the sensor itself for privacy and to reduce data dimensionality. The
featurized data from each Mites device are sent over WiFi using an end-to-end encrypted connection to
our custom Mites software backend, hosted on secure on campus servers. Our backend includes features
such as scalable data collection, management features, security and privacy primitives, and APIs for
access control and management. We have implemented different UIs and a cross platform web application to
support various building stakeholders and use cases.
We designed and implemented a three-tier architecture comprising a Gateway Layer (GL),
Request Management Layer (RML), and Device Management Layer (DML) to provide scalability, extensibility,
and reliability of our entire system (as shown in the above figure).
The GL and RML manage all connected Mites devices, routing and load balancing the various data streams to
different nodes of the DML depending on the available compute resources.
In addition, the GL and the RML store information about their operation in an existing open-source
distributed data store called BuildingDepot, which we have extended to support the Mites infrastructure.
Specifically, our extensions to BuildingDepot include new functionalities for scalability, privacy
(obfuscation of metadata such as location, DeviceID, etc.), and extensibility, which are essential
for a large-scale sensing infrastructure for buildings. Each DML worker node handles streams of
featurized data from a set of Mites devices at configurable rates from 1 - 10 Hz.
Finally, to enable scalable Machine Learning, we integrate our Mites system with a ML platform designed
specifically for IoT use cases, called MLIoT. The DML and RML interact with MLIoT allow data visualization
from Mites devices and provide training and serving ML models efficiently at scale.
Hardware
We designed a custom highly-integrated Mites device, with nine distinct physical sensors and
specifically decided not to include a camera. Our integrated “single device” design serves as an
exemplary embodiment of board design using many low-level sensor modalities to provide an
exciting vehicle for IoT investigation.
We strategically placed sensors on the PCB to ensure optimal performance (e.g., ambient light sensor
faces outwards), and we spatially separated analog and digital components to isolate unintended
electrical noise from affecting the performance of neighboring components. For connectivity, we
considered industry standards such as Ethernet and ZigBee but ultimately chose a combination of WiFi
and Bluetooth for its ubiquity, ease- of-setup, range, and high bandwidth.
Firmware
Our firmware featurizes data on-board the Mites device. Not only does this reduce network overhead,
but it also denatures the data, better protecting privacy while still preserving the essence of the
signal. In particular, we selected features that do not permit
reconstruction of the original signal.
Data from our high-sample-rate sensors are transformed into a spectral featurized representation by
performing an FFT on a small subset of the data. The resulting featurized data is sent at a user
configurable rate (one to ten FFTs per second). We also discard phase information. Our raw 8x8 GridEye
matrix is flattened into row and column means (16 features). For our other low-sample-rate sensors,
we compute seven statistical features (min, max, range, mean, sum, standard deviation and centroid)
on a rolling one-second buffer (at 10 Hz). The featurized data for every sensor is
concatenated and sent to our secure server (located on campus) as a single data frame,
encrypted with 128-bit AES.
We tune our raw sensor sampling rates over the course of deployment, collecting data
at the speed needed to capture environmental events, but with no unnecessary
fidelity. Specifically, we sample temperature, humidity, pressure, light color,
light intensity, magnetometer, Wifi RSSI, GridEye and PIR motion sensors at 10 Hz.
While the high-sample rate sensors are sampled at higher-rates (e.g. accelerometer at 4 kHz,
EMI at 500 kHz, Microphone at 16 kHz), our featurization approach mentioned above to calculate spectral
features reduces the data to a maximum of 10 FFTs sent per second.
Note that when accelerometers are sampled at high
speed, they can detect minute oscillatory vibrations propagating through structural
elements in an environment (e.g., dry-wall, studs, joists), very much like a geophone.
Privacy and Security
Enterprise-grade End-to-End Security
Our device firmware is pre-programmed with asymmetric cryptographic keys to mutually authenticate
themselves with our backend and use that to derive symmetric cryptographic keys using industry-standard
protocols for secure end-to-end communication. Moreover, the Mites devices are pre-programmed to only
communicate with our backend server and no other host using our custom and proprietary packet format.
The Mites are also on a non-internet routable campus encrypted WiFi network that is unreachable outside
the campus for additional security. Our backend server is hardened to use regular enterprise-grade
security, including firewalls. Finally, we have implemented tamper detection mechanisms for the
Mites devices.
Firmware Featurization
Our firmware featurizes data on-board the Mites device. Not only does this reduce network overhead, but
it also denatures the data, better protecting privacy while still preserving the essence of the signal.
The data from the sensors on each Mites device is processed in a series of steps that essentially
convert it into a non-reconstructable featurized representation that consists of basic statistical
features (min, max, range, average, sum, standard deviation, and centroid) and aggregated frequency
representation values (using a Fast Fourier Transform (FFT)). This featurization and denaturing of data
is done specifically to mitigate any privacy concerns such that the essence of the signals can be
extracted while preventing the reconstruction of the original signals. Notably, all this processing and
denaturing happens on the Mites device itself in its secure firmware; thus, the raw sensor data never
leaves the Mites device.
Location obfuscation
We also designed and implemented a novel privacy-aware data collection method that reduces the potential
risk of indirect association of the sensor data in office spaces with the behavior of one or more of the
occupants of the office. We obfuscate the locations in offices where occupants have not consented yet
such that a set of offices are all grouped together (e.g. all offices on the N/W corner of the
3rd floor). These obfuscated locations can still allow applications that need aggregate data
(e.g. average humidity and temperature in the 3rd floor N/W corner) while preventing indirect
association of the sensor data from its office occupant(s).
Data Model Views of Sensor Data to enable Privacy
Applications may need data from one or more sensors on a Mites device (e.g. occupancy detection may need
PIR and thermal grid eye data). Similarly, once occupants have given their consent they may want to
share data from a subset of their sensors with other users or applications. The Mites system provides
fine-grained mechanisms to enable/disable access to specific sensor(s) from a Mites device, as well as
specifying the level of access (Read, Write) is necessary to prevent over privileged apps. Our goal with
these primitives is to provide occupants the transparency, and control, on who has access to the data
from their personal spaces and for which applications and purposes.
Fine-grained Access Controls for Users
We provide extensive privacy controls for authenticated occupants in these offices to disable any
(or even all) of the sensors using the Mites Mobile App. Users who don’t use the MitesApp can just
send us an email to disable any or all the sensors on the Mites in their office, or request them to be
powered off completely.