Room-level occupant counts and environmental quality from heterogeneous sensing modalities in a smart building

Schwee, Jens Hjort; Johansen, Aslak; Jørgensen, Bo Nørregaard; Kjærgaard, Mikkel Baun; Mattera, Claudio Giovanni; Sangogboye, Fisayo Caleb; Veje, Christian

doi:10.1038/s41597-019-0274-4

Download PDF

Data Descriptor
Open access
Published: 26 November 2019

Room-level occupant counts and environmental quality from heterogeneous sensing modalities in a smart building

Scientific Data volume 6, Article number: 287 (2019) Cite this article

4290 Accesses
17 Citations
1 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 28 February 2020

This article has been updated

Abstract

The research areas of occupant sensing and occupant behavior modeling are lacking comprehensive public datasets for providing baseline results and fostering data-driven approaches. This data descriptor covers a dataset collected via sensors on room-level occupant counts together with related data on indoor environmental quality. The dataset comprises 44 full days, collated in the period March 2018 to April 2019, and was collected in a public building in Northern Europe. Sensor readings cover three rooms, including one lecture room and two study zones. The data release contains two versions of the dataset, one which has the raw readings and one which has been upsampled to a one-minute resolution. The dataset can be used for developing and evaluating data-driven applications, occupant sensing, and building analytics. This dataset can be an impetus for the researchers and designers to conduct experiments and pilot studies, hence used for benchmarking.

Measurement(s)	carbon dioxide • humidity • visible light energy • occupant count • temperature of air • indoor airflow
Technology Type(s)	sensor • digital camera • damper position
Sample Characteristic - Organism	Homo sapiens
Sample Characteristic - Environment	office building
Sample Characteristic - Location	Kingdom of Denmark

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.9971549

A Global Building Occupant Behavior Database

Article Open access 28 June 2022

A high-fidelity residential building occupancy detection dataset

Article Open access 28 October 2021

The Chinese thermal comfort dataset

Article Open access 28 September 2023

Background & Summary

Accurate estimates of occupant counts in a building can be used in various applications areas, including smart spaces, safety and evacuation, facility management, and building operations^1,2. In the building operation area, occupant counts can enable applications, like, adaptive ventilation in rooms, occupant-based energy benchmarking, and model-predictive control of room setpoints. In these applications, the more accurately the numbers of occupants can be sensed, the more energy-efficient a building can be operated³. For studies of adaptive ventilation and model-predictive control, it is also necessary that occupant presence can be linked to indoor environmental quality and ventilation rates.

Datasets are needed for such research which captures the conditions in non-laboratory settings. This is because the developed systems and algorithms have to handle omissions and faults in the sensor data that they process to be applicable beyond the lab. Therefore, the released dataset is collected in a living lab building, which is a standard building in normal use but where additional effort has been made to enable data collection. However, as standard components are used uncertainties around systems and their calibration is higher than in a laboratory setting.

The building considered for this data release is a teaching and office building, located at the University of Southern Denmark, Odense campus. Denmark has a temperate climate representative for the northern part of Europe. The building has been designed to serve as a living lab for data-driven research on building operation and optimization⁴. The role as a living lab has been communicated at the university, and a screen at one of the entrances show examples of data collected. However, as the data collection is mostly invisible to occupants, the collection does not impact their behavior in any noticeable manner. The building is approximately 8500 m² split across three floors and a basement. It has 1000 occupants on a typical weekday and is used for both student activities and staff. Data is collected at the spatial granularity of rooms. The data have been collected in three rooms, two of them study zones, and one is a lecture room. The rooms were selected to cover different usage patterns and differences in sunlight by facing either south-east or north-west. The study zones have a mixed-use for student activities, such as project work and solving exercises. The teaching room is mostly used for scheduled activities, typically spanning between two to four hours.

The rooms contain a unique collection of sensor modalities covering both occupant presence and indoor environmental quality factors, including CO₂ concentration level, relative humidity, illuminance, occupant counts, occupant counts entering and leaving the rooms, temperature, and the in-room airflow, estimated by the damper position, which is correlated to the airflow, the air is outdoor air heated using heat recovery. The placement of the sensors follows the standard practice of the building industry in Denmark.

Compared to existing sensor-based datasets for buildings, most of them consider residential homes (e.g., Barker et al.⁵). We, however, With this dataset, consider commercial buildings. Previous datasets for commercial buildings include fewer sensor modalities and has a lower temporal resolution than the presented dataset. Previous datasets include a dataset with only three modalities and a small temporal range by the University of Southern Denmark, described in⁶, a dataset by Lawrence Berkeley National Lab with lower granularity and fewer modalities but with more background variables⁷ and a dataset with only one sensor modality by University of Texas, San Antonio⁸. Thereby, the released dataset is unique due to the number of sensor modalities available.

The sensor modalities in the dataset enable researchers to both study new technical solutions (e.g., CO₂-based occupant estimation algorithms⁹, adaptive ventilation, or model-predictive control) and establish knowledge on occupants and indoor environmental quality (e.g., quantify the correlation between occupants and air quality). The dataset can also be used to learn modeling parameters for occupants to more accurately parameterize building performance simulations³.

The occupant counts entering and leaving the rooms have been collected using six state-of-the-art PC2 3D stereo vision cameras produced by the company Xovis, which have been mounted over the entrances to the rooms. To estimate the number of occupants in the rooms, we have used the PLCount algorithm¹⁰ on the raw readings from the cameras. The sensor and method have been evaluated in the building with a manually obtained ground truth based on video recordings. The study documented in¹⁰ showed an accuracy of 0.075 Root Mean Square Error (RMSE). The other building data is collected via standard-grade sensors connected to a building management system (BMS), which for the particular building is a Schneider Electric BMS. The data for the release is collected through application programming interfaces of the BMS. The CO₂ sensor data have not been cleaned. Therefore users of the data should address known issues with this stream, including offsets and drifts⁹. An overview of the different sensor streams can be found in Table 1, including units and uncertainties. See Table 2 for details of the sampling strategies of the various sensors and room. The physical placement of the sensors can be found in Table 3.

Table 1 The sensor streams in the released dataset, and the units which they are measured in. Uncertainty is reported as specified in the technical product sheets.

Full size table

Table 2 The sampling strategy for each of the sensor streams at the individual rooms. CO₂ is sampled using dynamic sampling rate (DSR), where the sampling frequency is increased (up to one sample per minute) when higher levels of CO₂ are observed. Occupant counts are sampled using a static sampling rate (SSR). The relative humidity and temperature streams are sampled using a threshold-based strategy. The variable air volume (VAV) damper position is sampled when there is a change in the position.

Full size table

Table 3 Overview of the locations of the sensors inside the room.

Full size table

Methods

Selection methodology

The published dataset is collected in the period of March 1st, 2018, to April 30th, 2019. We would like only to publish periods of continuous readings, but since the source is a real building and BMS APIs were used for collecting the data, we have had to adjust expectations slightly. We only considered full days of data. Days where the CO₂ sampled stream had more than three missing readings in a row were not considered, hence allows gaps of 15 minutes. Threshold-based sensors which only collect a sample when there is a change larger than the threshold has not been conceded for eliminating days since it is impossible to evaluate how many readings such streams should contain. Additionally, we chose not to consider two consecutive days as this would make the released data susceptible to privacy attacks. This decision is based on the results of a study¹¹, which showed that using CO₂ streams, the data could be deanonymized. This attack could be used to identify the weekday. Which could be used to reveal the identity of the room, by doing a data linkage attack using the teaching rooms scheduled activates and the released streams, as showcased in¹¹. To eliminated days in a sequence, we have selected the following procedure: For sequences with an even amount of days, the days were randomly removed to comply with the rule. Uneven amount of days in the sequence was removed by maximizing the number of days in the output. This left 44 full days of data covering the three test rooms and all the sensor modalities. These make up the days of the released dataset.

Data processing

In the released dataset, we have provided two forms of the data, the original raw form and one which has been pre-processed to allow easier use of the data by having a stable sample rate. The pre-processing applies forward fill and then backward fill on the original streams. Forward fill, fills missing samples for the desired sampling frequency by filling the gaps in the stream with the last reported value in the stream, until a new reported value is reacted in the stream, backward fills do the same but back in time. The sampling rate for the fills has been set to minute-wise sampling for all of the streams. Other sampling rates can be computed using the original dataset. The sample rate has been selected to accommodate the identified use cases, e.g., occupancy and model-predictive control⁹. The original dataset has only been changed for the event and threshold-based sensors, by adding a reading at 00:00:00, which had the value of the last reading of the previous day.

Data suppression

The most sensitive part of the data release is the identity of the rooms. Since combined with the occupant counts and knowledge of room activities, one can calculate a teacher performance index, as demonstrated in¹¹. Thus we have anonymized room identity and replaced the dates by a DayId, which is a random number assigned in a non-chronological order. To limit the effect on the usefulness of the dataset, we have introduced a year, month, and a workday indicator. The time of day is untouched.

Data Records

The released dataset, hosted on figshare¹², contains the mentioned sensor modalities, and the amount of readings of each of the rooms can be seen in Table 4. The upsampled dataset 67680 readings per stream. The data coverage for the sensors using the sampling strategies of dynamic sampling rate (DSR) and static sampling rate (SSR), can be found in Table 5. The illuminance data stream have relatively low data coverage, we have added it since it still captures the tendency for the light level in the rooms through the days, although the coverage can affect the usefulness of the stream. Summary statistics for the upsampled dataset for rooms 1, 2 and, 3 can be found in Tables 6–8. The data for each combination of sensor modality and room can be found in separate comma-separated value (CSV) files. In addition to the sensor values, these files have columns for the metadata defined in the data suppression section, namely: Timestamps, year, month and workday indicators, and dayId, which also can be found in the readme file. The two versions of the data, original and upsampled, can be found in the folder’s original and filleddata, respectively. The metadata also contains room type, size, seating capacity, and volume, which can be found in Table 9, and in the roominfo file. Furthermore, we have provided a Brick representation of the sensor instrumentation¹³, found in the brick_graph file generated using the brick_generator script. The Brick model consists of the physical relations between the sensors streams, the building, and rooms. It is modeled using Resource Description Framework (RDF) triples between the elements in the model. Each of the rooms models are the same, an example of them can be found in Fig. 1. Finally, have we included a categorization of the sensors following the Mahdavi and Taheri¹⁴ ontology, which can be found in the occupant-behavior-ontology file or in Tables 10, 11, and 12 for the indoor conditions, inhabitants, and control systems, respectively.

Table 4 The sensor modalities in each of the rooms and the number of readings in the streams, in the original data stream.

Full size table

Table 5 The data coverage, for the sensors with sampling strategy of DSR and SSR, in the original data stream.

Full size table

Table 6 Summary statistics for the upsampled streams in Room 1.

Full size table

Table 7 Summary statistics for the upsampled streams in Room 2.

Full size table

Table 8 Summary statistics for the upsampled streams in Room 3.

Full size table

Table 9 Metadata for each of the rooms.

Full size table

Table 10 Sensor streams monitoring indoor conditions. For the upsampled version of the dataset. Short names used in the table: Spatial attribute (SA), Temporal attribute (TA), Topological reference (TR), Sampling Interval (SI), Data Source (DS), and Quantitative (Q).

Full size table

Table 11 Sensor streams monitoring inhabitants. For the upsampled version of the dataset. Short names used in the table: Spatial attribute (SA), Temporal attribute (TA), Topological reference (TR), Sampling Interval (SI), Data Source (DS), and Quantitative (Q).

Full size table

Table 12 Sensor streams monitoring control systems. For the upsampled version of the dataset. Short names used in the table: Spatial attribute (SA), Temporal attribute (TA), Topological reference (TR), Sampling Interval (SI), Data Source (DS), and Quantitative (Q).

Full size table

Technical Reliability

To evaluate the technical reliability of the dataset, we have provided plots, showing the daily profiles of each sensor modality. Furthermore, we provide additional evidence for each of the streams based on statistical analysis.

In¹⁵, the authors showcase the relation between the VAV damper position, CO₂, and the number of occupants, which can be seen in Fig. 2. The Pearson product-moment correlation coefficients, which measure the linear association between two variables, between the CO₂ and damper position in the dataset is 0.87700, 0.89287, and 0.80668 for Room 1, 2 and 3, respectively. The correlation between CO₂ and the number of occupants is 0.65863, 0.83680, and 0.81663 for the rooms. Finally, the correlation coefficient between damper position and the number of occupants is 0.70158, 0.77950, and 0.69057 in the dataset, using the data form the day with the DayId of 9. These numbers highlight the expected relationships between these modalities¹⁶. As expected, there exists a slight positive correlation among these data streams. This is because the operations of the damper position are regulated to maintain CO₂ concentration below a particular threshold. Likewise, CO₂ concentration is mostly influenced by the number of occupants in a particular space. However, we do not expect correlation coefficients for a perfect relationship. We have compared the total amount of people entering and exiting the three rooms. The results show that, according to the sensors, 0.17% more enters room 1 then leaves the room. In room 2, 1.12% more leave the room than enters the room. In room 3, 0.29% more leave the room than enters. For all rooms, we have used the total amount of people entering and exiting in the monitored period for the comparison. These numbers indicate that the observed sensing error is very low.

In¹⁷ there have been performed reliability tests for the CO₂ and temperature streams, where it was found that the CO₂ sensors were not calibrated and therefore was the sensors replaced and calibrated. In Figs. 3–5 the profiles for the CO₂ concentrations, the damper position, and the occupancy estimation for the rooms can be seen. Highlighting expected patterns during daytime versus night. The lowest CO₂ readings for the three rooms are 406.72, 405.76, and 408.0 ppm for Room 1, 2, and 3, respectively. This is close to the ambient concentrations for Denmark, which is around 400 ppm.

The daily illuminance level for the three rooms is shown in Fig. 6. Rooms 1 and 2 are both located on the west side of the building, which can be inferred by observing the daily profiles shown in the figure, where the illuminance is peaking later during the day when there is direct sunlight on the windows. The same can be observed for Room 3, which has eastern exposure and therefore peaks in the morning. Furthermore, does all three rooms have the lowest reading of 0 lux, which is the expected lowest value since the sensor can not detect values below 10 lux, as specified in the technical product sheet. The humidity and temperature daily profiles can be seen in Figs. 7 and 8. Here the impact of the sun in the afternoon for Room 1 and Room 2 can be observed, which is not present for Room 3.

Code availability

The pre-processing code for suppression and data processing is available online (https://github.com/sdu-cfei/Building-Data-Occupant-Modeling).

Change history

28 February 2020
A Correction to this paper has been published: https://doi.org/10.1038/s41597-020-0416-8

References

Angermann, M., Khider, M. & Robertson, P. Towards operational systems for continuous navigation of rescue teams. In 2008 IEEE/ION Position, Location and Navigation Symposium 153–158, https://doi.org/10.1109/PLANS.2008.4570052 (2008).
Erickson, V. L., Carreira-Perpiñán, M. Á. & Cerpa, A. E. OBSERVE: Occupancy-based system for efficient reduction of HVAC energy. In Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks 258–269, (2011).
Sangogboye, F. C. et al. The Impact of Occupancy Resolution on the Accuracy of Building Energy Performance Simulation. In Proceedings of the 5th Conference on Systems for Built Environments 103–106, https://doi.org/10.1145/3276774.3276784 (ACM, 2018).
Jradi, M. et al. A World Class Energy Efficient University Building by Danish 2020 Standards. Energy Procedia 132, 21–26, https://doi.org/10.1016/j.egypro.2017.09.625 (2017).
Article Google Scholar
Barker, S. et al. Smart *: An Open Data Set and Tools for Enabling Research in Sustainable Homes. In Proceedings of the 2012 Workshop on Data Mining Applications in Sustainability (ACM, 2012).
Arendt, K. et al. Room-level Occupant Counts, Airflow and CO₂ Data from an Office Building. In Proceedings of the FirstWorkshop on Data Acquisition To Analysis 13–14, https://doi.org/10.1145/3277868.3277875 (ACM, 2018).
Langevin, J., Gurian, P. L. & Wen, J. One year occupant behavior/environment data for medium U.S. office. OpenEi, https://openei.org/datasets/dataset/one-year-behavior-environment-data-for-medium-office (2015).
Dong, B. Long-term occupancy data for residential and commercial building. OpenEi, https://openei.org/datasets/dataset/long-term-occupancy-data-for-residential-and-commercial-building (2015).
Sangogboye, F. C. et al. Performance comparison of occupancy count estimation and prediction with common versus dedicated sensors for building model predictive control. Build. Simul. 10, 829–843, https://doi.org/10.1007/s12273-017-0397-5 (2017).
Article Google Scholar
Sangoboye, F. C. & Kjærgaard, M. B. PLCount: A Probabilistic Fusion Algorithm for Accurately Estimating Occupancy from 3D Camera Counts. In Proceedings of the 3rd ACM International Conference on Systems for Energy-Efficient Built Environments 147–156, https://doi.org/10.1145/2993422.2993575 (ACM, 2016).
Schwee, J. H., Sangogboye, F. C. & Kjærgaard, M. B. Evaluating Practical Privacy Attacks for Building Data Anonymized by Standard Methods. (2019).
Schwee, J. H. et al. Room-level occupant counts and environmental quality from heterogeneous sensing modalities in a smart building. figshare, https://doi.org/10.6084/m9.figshare.c.4505813 (2019).
Balaji, B. et al. Brick: Towards a Unified Metadata Schema For Buildings. In Proceedings of the 3rd ACM International Conference on Systems for Energy-Efficient Built Environments 41–50, https://doi.org/10.1145/2993422.2993577 (ACM, 2016).
Mahdavi, A. & Taheri, M. An ontology for building monitoring. J. Build. Perform. Simul 10, 499–508, https://doi.org/10.1080/19401493.2016.1243730 (2017).
Article Google Scholar
Schwee, J. H., Sangogboye, F. C. & Kjærgaard, M. B. Anonymizing Building Data for Data Analytics in Cross-organizational Settings. In Proceedings of the International Conference on Internet of Things Design and Implementation 1–12, https://doi.org/10.1145/3302505.3310064 (ACM, 2019).
Jiang, C., Masood, M. K., Soh, Y. C. & Li, H. Indoor occupancy estimation from carbon dioxide concentration. In Energy and Buildings 131, 132–141 (2016).
Article Google Scholar
Mattera, C. G., Lazarova-Molnar, S., Shaker, H. R. & Jørgensen, B. N. A Practical Approach to Validation of Buildings’ Sensor Data: A Commissioning Experience Report. In 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService) 287–292, https://doi.org/10.1109/BigDataService.2017.48 (2017).

Download references

Acknowledgements

The Innovation Fund Denmark supported this work for the project COORDICY (4106-00003B), the HBODEx supported by the SDU ODEx initiative, and IEA EBC Annex 79 supported by EUDP: 64018-0558.

Author information

Authors and Affiliations

University of Southern Denmark, Campusvej 55, 5230, Odense, Denmark
Jens Hjort Schwee, Aslak Johansen, Bo Nørregaard Jørgensen, Mikkel Baun Kjærgaard, Claudio Giovanni Mattera, Fisayo Caleb Sangogboye & Christian Veje

Authors

Jens Hjort Schwee
View author publications
You can also search for this author in PubMed Google Scholar
Aslak Johansen
View author publications
You can also search for this author in PubMed Google Scholar
Bo Nørregaard Jørgensen
View author publications
You can also search for this author in PubMed Google Scholar
Mikkel Baun Kjærgaard
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Giovanni Mattera
View author publications
You can also search for this author in PubMed Google Scholar
Fisayo Caleb Sangogboye
View author publications
You can also search for this author in PubMed Google Scholar
Christian Veje
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.H.S. contributed to the software for processing data, and wrote the first draft of the paper, and revised successive versions of the data descriptor. A.J. contributed to the design of the living lab experimental design, to the software for processing data, and revised successive versions of the data descriptor. B.N.J. contributed to the design of the living lab experimental design and revised successive versions of the data descriptor. M.B.K. contributed to the design of the living lab experimental design, to the software for processing data, and wrote the first draft of the paper, and revised successive versions of the data descriptor. C.G.M. contributed to the design of the living lab experimental design, to the software for processing data, and revised successive versions of the data descriptor. F.C.S. contributed to the design of the living lab experimental design, to the software for processing data, and revised successive versions of the data descriptor. C.T.V. contributed to the design of the living lab experimental design and revised successive versions of the data descriptor.

Corresponding author

Correspondence to Jens Hjort Schwee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Schwee, J.H., Johansen, A., Jørgensen, B.N. et al. Room-level occupant counts and environmental quality from heterogeneous sensing modalities in a smart building. Sci Data 6, 287 (2019). https://doi.org/10.1038/s41597-019-0274-4

Download citation

Received: 15 May 2019
Accepted: 11 October 2019
Published: 26 November 2019
DOI: https://doi.org/10.1038/s41597-019-0274-4

This article is cited by

Occupant behavior, thermal environment, and appliance electricity use of a single-family apartment in China
- Chuang Wang
- Xiaoyan Li
- Shufang Gao
Scientific Data (2024)
Understanding occupants’ behaviour, engagement, emotion, and comfort indoors with heterogeneous sensors and wearables
- Nan Gao
- Max Marschall
- Flora D. Salim
Scientific Data (2022)
The COVID-19 impact on air condition usage: a shift towards residential energy saving
- Muhammad Saidu Aliero
- Muhammad Fermi Pasha
- Imran Ghani
Environmental Science and Pollution Research (2022)
ROBOD, room-level occupancy and building operation dataset
- Zeynep Duygu Tekler
- Eikichi Ono
- Adrian Chong
Building Simulation (2022)
A Global Building Occupant Behavior Database
- Bing Dong
- Yapan Liu
- Xin Zhou
Scientific Data (2022)