Abstract
The research areas of occupant sensing and occupant behavior modeling are lacking comprehensive public datasets for providing baseline results and fostering data-driven approaches. This data descriptor covers a dataset collected via sensors on room-level occupant counts together with related data on indoor environmental quality. The dataset comprises 44 full days, collated in the period March 2018 to April 2019, and was collected in a public building in Northern Europe. Sensor readings cover three rooms, including one lecture room and two study zones. The data release contains two versions of the dataset, one which has the raw readings and one which has been upsampled to a one-minute resolution. The dataset can be used for developing and evaluating data-driven applications, occupant sensing, and building analytics. This dataset can be an impetus for the researchers and designers to conduct experiments and pilot studies, hence used for benchmarking.
Measurement(s) | carbon dioxide • humidity • visible light energy • occupant count • temperature of air • indoor airflow |
Technology Type(s) | sensor • digital camera • damper position |
Sample Characteristic - Organism | Homo sapiens |
Sample Characteristic - Environment | office building |
Sample Characteristic - Location | Kingdom of Denmark |
Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.9971549
Similar content being viewed by others
Background & Summary
Accurate estimates of occupant counts in a building can be used in various applications areas, including smart spaces, safety and evacuation, facility management, and building operations1,2. In the building operation area, occupant counts can enable applications, like, adaptive ventilation in rooms, occupant-based energy benchmarking, and model-predictive control of room setpoints. In these applications, the more accurately the numbers of occupants can be sensed, the more energy-efficient a building can be operated3. For studies of adaptive ventilation and model-predictive control, it is also necessary that occupant presence can be linked to indoor environmental quality and ventilation rates.
Datasets are needed for such research which captures the conditions in non-laboratory settings. This is because the developed systems and algorithms have to handle omissions and faults in the sensor data that they process to be applicable beyond the lab. Therefore, the released dataset is collected in a living lab building, which is a standard building in normal use but where additional effort has been made to enable data collection. However, as standard components are used uncertainties around systems and their calibration is higher than in a laboratory setting.
The building considered for this data release is a teaching and office building, located at the University of Southern Denmark, Odense campus. Denmark has a temperate climate representative for the northern part of Europe. The building has been designed to serve as a living lab for data-driven research on building operation and optimization4. The role as a living lab has been communicated at the university, and a screen at one of the entrances show examples of data collected. However, as the data collection is mostly invisible to occupants, the collection does not impact their behavior in any noticeable manner. The building is approximately 8500 m2 split across three floors and a basement. It has 1000 occupants on a typical weekday and is used for both student activities and staff. Data is collected at the spatial granularity of rooms. The data have been collected in three rooms, two of them study zones, and one is a lecture room. The rooms were selected to cover different usage patterns and differences in sunlight by facing either south-east or north-west. The study zones have a mixed-use for student activities, such as project work and solving exercises. The teaching room is mostly used for scheduled activities, typically spanning between two to four hours.
The rooms contain a unique collection of sensor modalities covering both occupant presence and indoor environmental quality factors, including CO2 concentration level, relative humidity, illuminance, occupant counts, occupant counts entering and leaving the rooms, temperature, and the in-room airflow, estimated by the damper position, which is correlated to the airflow, the air is outdoor air heated using heat recovery. The placement of the sensors follows the standard practice of the building industry in Denmark.
Compared to existing sensor-based datasets for buildings, most of them consider residential homes (e.g., Barker et al.5). We, however, With this dataset, consider commercial buildings. Previous datasets for commercial buildings include fewer sensor modalities and has a lower temporal resolution than the presented dataset. Previous datasets include a dataset with only three modalities and a small temporal range by the University of Southern Denmark, described in6, a dataset by Lawrence Berkeley National Lab with lower granularity and fewer modalities but with more background variables7 and a dataset with only one sensor modality by University of Texas, San Antonio8. Thereby, the released dataset is unique due to the number of sensor modalities available.
The sensor modalities in the dataset enable researchers to both study new technical solutions (e.g., CO2-based occupant estimation algorithms9, adaptive ventilation, or model-predictive control) and establish knowledge on occupants and indoor environmental quality (e.g., quantify the correlation between occupants and air quality). The dataset can also be used to learn modeling parameters for occupants to more accurately parameterize building performance simulations3.
The occupant counts entering and leaving the rooms have been collected using six state-of-the-art PC2 3D stereo vision cameras produced by the company Xovis, which have been mounted over the entrances to the rooms. To estimate the number of occupants in the rooms, we have used the PLCount algorithm10 on the raw readings from the cameras. The sensor and method have been evaluated in the building with a manually obtained ground truth based on video recordings. The study documented in10 showed an accuracy of 0.075 Root Mean Square Error (RMSE). The other building data is collected via standard-grade sensors connected to a building management system (BMS), which for the particular building is a Schneider Electric BMS. The data for the release is collected through application programming interfaces of the BMS. The CO2 sensor data have not been cleaned. Therefore users of the data should address known issues with this stream, including offsets and drifts9. An overview of the different sensor streams can be found in Table 1, including units and uncertainties. See Table 2 for details of the sampling strategies of the various sensors and room. The physical placement of the sensors can be found in Table 3.
Methods
Selection methodology
The published dataset is collected in the period of March 1st, 2018, to April 30th, 2019. We would like only to publish periods of continuous readings, but since the source is a real building and BMS APIs were used for collecting the data, we have had to adjust expectations slightly. We only considered full days of data. Days where the CO2 sampled stream had more than three missing readings in a row were not considered, hence allows gaps of 15 minutes. Threshold-based sensors which only collect a sample when there is a change larger than the threshold has not been conceded for eliminating days since it is impossible to evaluate how many readings such streams should contain. Additionally, we chose not to consider two consecutive days as this would make the released data susceptible to privacy attacks. This decision is based on the results of a study11, which showed that using CO2 streams, the data could be deanonymized. This attack could be used to identify the weekday. Which could be used to reveal the identity of the room, by doing a data linkage attack using the teaching rooms scheduled activates and the released streams, as showcased in11. To eliminated days in a sequence, we have selected the following procedure: For sequences with an even amount of days, the days were randomly removed to comply with the rule. Uneven amount of days in the sequence was removed by maximizing the number of days in the output. This left 44 full days of data covering the three test rooms and all the sensor modalities. These make up the days of the released dataset.
Data processing
In the released dataset, we have provided two forms of the data, the original raw form and one which has been pre-processed to allow easier use of the data by having a stable sample rate. The pre-processing applies forward fill and then backward fill on the original streams. Forward fill, fills missing samples for the desired sampling frequency by filling the gaps in the stream with the last reported value in the stream, until a new reported value is reacted in the stream, backward fills do the same but back in time. The sampling rate for the fills has been set to minute-wise sampling for all of the streams. Other sampling rates can be computed using the original dataset. The sample rate has been selected to accommodate the identified use cases, e.g., occupancy and model-predictive control9. The original dataset has only been changed for the event and threshold-based sensors, by adding a reading at 00:00:00, which had the value of the last reading of the previous day.
Data suppression
The most sensitive part of the data release is the identity of the rooms. Since combined with the occupant counts and knowledge of room activities, one can calculate a teacher performance index, as demonstrated in11. Thus we have anonymized room identity and replaced the dates by a DayId, which is a random number assigned in a non-chronological order. To limit the effect on the usefulness of the dataset, we have introduced a year, month, and a workday indicator. The time of day is untouched.
Data Records
The released dataset, hosted on figshare12, contains the mentioned sensor modalities, and the amount of readings of each of the rooms can be seen in Table 4. The upsampled dataset 67680 readings per stream. The data coverage for the sensors using the sampling strategies of dynamic sampling rate (DSR) and static sampling rate (SSR), can be found in Table 5. The illuminance data stream have relatively low data coverage, we have added it since it still captures the tendency for the light level in the rooms through the days, although the coverage can affect the usefulness of the stream. Summary statistics for the upsampled dataset for rooms 1, 2 and, 3 can be found in Tables 6–8. The data for each combination of sensor modality and room can be found in separate comma-separated value (CSV) files. In addition to the sensor values, these files have columns for the metadata defined in the data suppression section, namely: Timestamps, year, month and workday indicators, and dayId, which also can be found in the readme file. The two versions of the data, original and upsampled, can be found in the folder’s original and filleddata, respectively. The metadata also contains room type, size, seating capacity, and volume, which can be found in Table 9, and in the roominfo file. Furthermore, we have provided a Brick representation of the sensor instrumentation13, found in the brick_graph file generated using the brick_generator script. The Brick model consists of the physical relations between the sensors streams, the building, and rooms. It is modeled using Resource Description Framework (RDF) triples between the elements in the model. Each of the rooms models are the same, an example of them can be found in Fig. 1. Finally, have we included a categorization of the sensors following the Mahdavi and Taheri14 ontology, which can be found in the occupant-behavior-ontology file or in Tables 10, 11, and 12 for the indoor conditions, inhabitants, and control systems, respectively.
Technical Reliability
To evaluate the technical reliability of the dataset, we have provided plots, showing the daily profiles of each sensor modality. Furthermore, we provide additional evidence for each of the streams based on statistical analysis.
In15, the authors showcase the relation between the VAV damper position, CO2, and the number of occupants, which can be seen in Fig. 2. The Pearson product-moment correlation coefficients, which measure the linear association between two variables, between the CO2 and damper position in the dataset is 0.87700, 0.89287, and 0.80668 for Room 1, 2 and 3, respectively. The correlation between CO2 and the number of occupants is 0.65863, 0.83680, and 0.81663 for the rooms. Finally, the correlation coefficient between damper position and the number of occupants is 0.70158, 0.77950, and 0.69057 in the dataset, using the data form the day with the DayId of 9. These numbers highlight the expected relationships between these modalities16. As expected, there exists a slight positive correlation among these data streams. This is because the operations of the damper position are regulated to maintain CO2 concentration below a particular threshold. Likewise, CO2 concentration is mostly influenced by the number of occupants in a particular space. However, we do not expect correlation coefficients for a perfect relationship. We have compared the total amount of people entering and exiting the three rooms. The results show that, according to the sensors, 0.17% more enters room 1 then leaves the room. In room 2, 1.12% more leave the room than enters the room. In room 3, 0.29% more leave the room than enters. For all rooms, we have used the total amount of people entering and exiting in the monitored period for the comparison. These numbers indicate that the observed sensing error is very low.
In17 there have been performed reliability tests for the CO2 and temperature streams, where it was found that the CO2 sensors were not calibrated and therefore was the sensors replaced and calibrated. In Figs. 3–5 the profiles for the CO2 concentrations, the damper position, and the occupancy estimation for the rooms can be seen. Highlighting expected patterns during daytime versus night. The lowest CO2 readings for the three rooms are 406.72, 405.76, and 408.0 ppm for Room 1, 2, and 3, respectively. This is close to the ambient concentrations for Denmark, which is around 400 ppm.
The daily illuminance level for the three rooms is shown in Fig. 6. Rooms 1 and 2 are both located on the west side of the building, which can be inferred by observing the daily profiles shown in the figure, where the illuminance is peaking later during the day when there is direct sunlight on the windows. The same can be observed for Room 3, which has eastern exposure and therefore peaks in the morning. Furthermore, does all three rooms have the lowest reading of 0 lux, which is the expected lowest value since the sensor can not detect values below 10 lux, as specified in the technical product sheet. The humidity and temperature daily profiles can be seen in Figs. 7 and 8. Here the impact of the sun in the afternoon for Room 1 and Room 2 can be observed, which is not present for Room 3.
Code availability
The pre-processing code for suppression and data processing is available online (https://github.com/sdu-cfei/Building-Data-Occupant-Modeling).
Change history
28 February 2020
A Correction to this paper has been published: https://doi.org/10.1038/s41597-020-0416-8
References
Angermann, M., Khider, M. & Robertson, P. Towards operational systems for continuous navigation of rescue teams. In 2008 IEEE/ION Position, Location and Navigation Symposium 153–158, https://doi.org/10.1109/PLANS.2008.4570052 (2008).
Erickson, V. L., Carreira-Perpiñán, M. Á. & Cerpa, A. E. OBSERVE: Occupancy-based system for efficient reduction of HVAC energy. In Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks 258–269, (2011).
Sangogboye, F. C. et al. The Impact of Occupancy Resolution on the Accuracy of Building Energy Performance Simulation. In Proceedings of the 5th Conference on Systems for Built Environments 103–106, https://doi.org/10.1145/3276774.3276784 (ACM, 2018).
Jradi, M. et al. A World Class Energy Efficient University Building by Danish 2020 Standards. Energy Procedia 132, 21–26, https://doi.org/10.1016/j.egypro.2017.09.625 (2017).
Barker, S. et al. Smart *: An Open Data Set and Tools for Enabling Research in Sustainable Homes. In Proceedings of the 2012 Workshop on Data Mining Applications in Sustainability (ACM, 2012).
Arendt, K. et al. Room-level Occupant Counts, Airflow and CO2 Data from an Office Building. In Proceedings of the FirstWorkshop on Data Acquisition To Analysis 13–14, https://doi.org/10.1145/3277868.3277875 (ACM, 2018).
Langevin, J., Gurian, P. L. & Wen, J. One year occupant behavior/environment data for medium U.S. office. OpenEi, https://openei.org/datasets/dataset/one-year-behavior-environment-data-for-medium-office (2015).
Dong, B. Long-term occupancy data for residential and commercial building. OpenEi, https://openei.org/datasets/dataset/long-term-occupancy-data-for-residential-and-commercial-building (2015).
Sangogboye, F. C. et al. Performance comparison of occupancy count estimation and prediction with common versus dedicated sensors for building model predictive control. Build. Simul. 10, 829–843, https://doi.org/10.1007/s12273-017-0397-5 (2017).
Sangoboye, F. C. & Kjærgaard, M. B. PLCount: A Probabilistic Fusion Algorithm for Accurately Estimating Occupancy from 3D Camera Counts. In Proceedings of the 3rd ACM International Conference on Systems for Energy-Efficient Built Environments 147–156, https://doi.org/10.1145/2993422.2993575 (ACM, 2016).
Schwee, J. H., Sangogboye, F. C. & Kjærgaard, M. B. Evaluating Practical Privacy Attacks for Building Data Anonymized by Standard Methods. (2019).
Schwee, J. H. et al. Room-level occupant counts and environmental quality from heterogeneous sensing modalities in a smart building. figshare, https://doi.org/10.6084/m9.figshare.c.4505813 (2019).
Balaji, B. et al. Brick: Towards a Unified Metadata Schema For Buildings. In Proceedings of the 3rd ACM International Conference on Systems for Energy-Efficient Built Environments 41–50, https://doi.org/10.1145/2993422.2993577 (ACM, 2016).
Mahdavi, A. & Taheri, M. An ontology for building monitoring. J. Build. Perform. Simul 10, 499–508, https://doi.org/10.1080/19401493.2016.1243730 (2017).
Schwee, J. H., Sangogboye, F. C. & Kjærgaard, M. B. Anonymizing Building Data for Data Analytics in Cross-organizational Settings. In Proceedings of the International Conference on Internet of Things Design and Implementation 1–12, https://doi.org/10.1145/3302505.3310064 (ACM, 2019).
Jiang, C., Masood, M. K., Soh, Y. C. & Li, H. Indoor occupancy estimation from carbon dioxide concentration. In Energy and Buildings 131, 132–141 (2016).
Mattera, C. G., Lazarova-Molnar, S., Shaker, H. R. & Jørgensen, B. N. A Practical Approach to Validation of Buildings’ Sensor Data: A Commissioning Experience Report. In 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService) 287–292, https://doi.org/10.1109/BigDataService.2017.48 (2017).
Acknowledgements
The Innovation Fund Denmark supported this work for the project COORDICY (4106-00003B), the HBODEx supported by the SDU ODEx initiative, and IEA EBC Annex 79 supported by EUDP: 64018-0558.
Author information
Authors and Affiliations
Contributions
J.H.S. contributed to the software for processing data, and wrote the first draft of the paper, and revised successive versions of the data descriptor. A.J. contributed to the design of the living lab experimental design, to the software for processing data, and revised successive versions of the data descriptor. B.N.J. contributed to the design of the living lab experimental design and revised successive versions of the data descriptor. M.B.K. contributed to the design of the living lab experimental design, to the software for processing data, and wrote the first draft of the paper, and revised successive versions of the data descriptor. C.G.M. contributed to the design of the living lab experimental design, to the software for processing data, and revised successive versions of the data descriptor. F.C.S. contributed to the design of the living lab experimental design, to the software for processing data, and revised successive versions of the data descriptor. C.T.V. contributed to the design of the living lab experimental design and revised successive versions of the data descriptor.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
About this article
Cite this article
Schwee, J.H., Johansen, A., Jørgensen, B.N. et al. Room-level occupant counts and environmental quality from heterogeneous sensing modalities in a smart building. Sci Data 6, 287 (2019). https://doi.org/10.1038/s41597-019-0274-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-019-0274-4
This article is cited by
-
Occupant behavior, thermal environment, and appliance electricity use of a single-family apartment in China
Scientific Data (2024)
-
Understanding occupants’ behaviour, engagement, emotion, and comfort indoors with heterogeneous sensors and wearables
Scientific Data (2022)
-
The COVID-19 impact on air condition usage: a shift towards residential energy saving
Environmental Science and Pollution Research (2022)
-
ROBOD, room-level occupancy and building operation dataset
Building Simulation (2022)
-
A Global Building Occupant Behavior Database
Scientific Data (2022)