We believe in the vision of smart buildings – reactive homes and workplaces that prioritize their occupants’ comfort and wellbeing, creating safe and healthy environments that not only boost happiness and productivity, but are also more maintainable and environmentally sustainable.

To achieve this vision, buildings will need to augment their conventional suite of sensors (smoke detectors, thermostats, motion sensors, badge access points, elevator load, etc.) with more advanced capabilities, able to detect fine-grained building states and occupant activities. This raises a significant sensing paradox and research challenge: how does one create a building that knows its fine-grained state and what its occupants are doing so as to be smart and responsive, while at the same time protecting occupant privacy?

To investigate this important and timely topic, we have created a unique research system at Carnegie Mellon University called Mites. This is an end-to-end, hardware-software system for supporting and managing distributed general-purpose sensing in buildings with ground-up fundamental primitives for privacy and security, scalable data management, and machine learning.

System Design

We built an end-to-end full-stack system with the goal of providing high-fidelity sensing of various ambient environmental facets in physical spaces in a building, ultimately enabling a diverse set of IoT applications.

Our Mites sensor package, comprising nine sensors that capture twelve multimodal sensing modalities (e.g., temperature, humidity, light, movement, and audio), is installed in the entire building, either in the ceiling or in the wall or plugged into a powered wall socket. Our Mites devices acquire data from the onboard sensors, perform signal processing, and extract different features, all on the sensor itself for privacy and to reduce data dimensionality. The featurized data from each Mites device are sent over WiFi using an end-to-end encrypted connection to our custom Mites software backend, hosted on secure on campus servers. Our backend includes features such as scalable data collection, management features, security and privacy primitives, and APIs for access control and management. We have implemented different UIs and a cross platform web application to support various building stakeholders and use cases.

We designed and implemented a three-tier architecture comprising a Gateway Layer (GL), Request Management Layer (RML), and Device Management Layer (DML) to provide scalability, extensibility, and reliability of our entire system (as shown in the above figure). The GL and RML manage all connected Mites devices, routing and load balancing the various data streams to different nodes of the DML depending on the available compute resources.

In addition, the GL and the RML store information about their operation in an existing open-source distributed data store called BuildingDepot, which we have extended to support the Mites infrastructure. Specifically, our extensions to BuildingDepot include new functionalities for scalability, privacy (obfuscation of metadata such as location, DeviceID, etc.), and extensibility, which are essential for a large-scale sensing infrastructure for buildings. Each DML worker node handles streams of featurized data from a set of Mites devices at configurable rates from 1 - 10 Hz.

Finally, to enable scalable Machine Learning, we integrate our Mites system with a ML platform designed specifically for IoT use cases, called MLIoT. The DML and RML interact with MLIoT allow data visualization from Mites devices and provide training and serving ML models efficiently at scale.


We designed a custom highly-integrated Mites device, with nine distinct physical sensors and specifically decided not to include a camera. Our integrated “single device” design serves as an exemplary embodiment of board design using many low-level sensor modalities to provide an exciting vehicle for IoT investigation.

We strategically placed sensors on the PCB to ensure optimal performance (e.g., ambient light sensor faces outwards), and we spatially separated analog and digital components to isolate unintended electrical noise from affecting the performance of neighboring components. For connectivity, we considered industry standards such as Ethernet and ZigBee but ultimately chose a combination of WiFi and Bluetooth for its ubiquity, ease- of-setup, range, and high bandwidth.


Our firmware featurizes data on-board the Mites device. Not only does this reduce network overhead, but it also denatures the data, better protecting privacy while still preserving the essence of the signal. In particular, we selected features that do not permit reconstruction of the original signal.

Data from our high-sample-rate sensors are transformed into a spectral featurized representation by performing an FFT on a small subset of the data. The resulting featurized data is sent at a user configurable rate (one to ten FFTs per second). We also discard phase information. Our raw 8x8 GridEye matrix is flattened into row and column means (16 features). For our other low-sample-rate sensors, we compute seven statistical features (min, max, range, mean, sum, standard deviation and centroid) on a rolling one-second buffer (at 10 Hz). The featurized data for every sensor is concatenated and sent to our secure server (located on campus) as a single data frame, encrypted with 128-bit AES.

We tune our raw sensor sampling rates over the course of deployment, collecting data at the speed needed to capture environmental events, but with no unnecessary fidelity. Specifically, we sample temperature, humidity, pressure, light color, light intensity, magnetometer, Wifi RSSI, GridEye and PIR motion sensors at 10 Hz. While the high-sample rate sensors are sampled at higher-rates (e.g. accelerometer at 4 kHz, EMI at 500 kHz, Microphone at 16 kHz), our featurization approach mentioned above to calculate spectral features reduces the data to a maximum of 10 FFTs sent per second. Note that when accelerometers are sampled at high speed, they can detect minute oscillatory vibrations propagating through structural elements in an environment (e.g., dry-wall, studs, joists), very much like a geophone.

Privacy and Security

Enterprise-grade End-to-End Security

Our device firmware is pre-programmed with asymmetric cryptographic keys to mutually authenticate themselves with our backend and use that to derive symmetric cryptographic keys using industry-standard protocols for secure end-to-end communication. Moreover, the Mites devices are pre-programmed to only communicate with our backend server and no other host using our custom and proprietary packet format. The Mites are also on a non-internet routable campus encrypted WiFi network that is unreachable outside the campus for additional security. Our backend server is hardened to use regular enterprise-grade security, including firewalls. Finally, we have implemented tamper detection mechanisms for the Mites devices.

Firmware Featurization

Our firmware featurizes data on-board the Mites device. Not only does this reduce network overhead, but it also denatures the data, better protecting privacy while still preserving the essence of the signal. The data from the sensors on each Mites device is processed in a series of steps that essentially convert it into a non-reconstructable featurized representation that consists of basic statistical features (min, max, range, average, sum, standard deviation, and centroid) and aggregated frequency representation values (using a Fast Fourier Transform (FFT)). This featurization and denaturing of data is done specifically to mitigate any privacy concerns such that the essence of the signals can be extracted while preventing the reconstruction of the original signals. Notably, all this processing and denaturing happens on the Mites device itself in its secure firmware; thus, the raw sensor data never leaves the Mites device.

Location obfuscation

We also designed and implemented a novel privacy-aware data collection method that reduces the potential risk of indirect association of the sensor data in office spaces with the behavior of one or more of the occupants of the office. We obfuscate the locations in offices where occupants have not consented yet such that a set of offices are all grouped together (e.g. all offices on the N/W corner of the 3rd floor). These obfuscated locations can still allow applications that need aggregate data (e.g. average humidity and temperature in the 3rd floor N/W corner) while preventing indirect association of the sensor data from its office occupant(s).

Data Model Views of Sensor Data to enable Privacy

Applications may need data from one or more sensors on a Mites device (e.g. occupancy detection may need PIR and thermal grid eye data). Similarly, once occupants have given their consent they may want to share data from a subset of their sensors with other users or applications. The Mites system provides fine-grained mechanisms to enable/disable access to specific sensor(s) from a Mites device, as well as specifying the level of access (Read, Write) is necessary to prevent over privileged apps. Our goal with these primitives is to provide occupants the transparency, and control, on who has access to the data from their personal spaces and for which applications and purposes.

Fine-grained Access Controls for Users

We provide extensive privacy controls for authenticated occupants in these offices to disable any (or even all) of the sensors using the Mites Mobile App. Users who don’t use the MitesApp can just send us an email to disable any or all the sensors on the Mites in their office, or request them to be powered off completely.

Demonstration Video