Cognitive validation map for early occupancy detection in environmental sensing
Introduction
An extensive body of literature exists on the issue of occupancy detection which is based on intellectual control systems Candanedo and Feldheim (2016b), Chen et al. (2016), Cocana-Fernandez et al. (2016), Ferreira et al. (2017), Hailemariam et al. (2011), Lam et al. (2009). Different types of heating, lighting, air conditioning (HVAC) systems specifically address the problem of energy consumption and security protection Candanedo and Feldheim (2016b), Derksen et al. (2015), Erickson et al. (2010), Roetzel and Tsangrassoulis (2012), Yang et al. (2012). The efficiency of the solution to this problem greatly depends on the methods of occupancy detection.
These methods may be classified into direct and indirect Candanedo and Feldheim (2016b), Huang et al. (2009), Jin (2016). Direct methods prioritize detection accuracy over occupants’ privacy employing vision- and tag-based systems. Indirect methods, in contrast, focus on less intrusive sensors: passive infrared (PIR), pressure sensors, and environmental sensors to measure carbon monoxide (CO), volatile organic compounds, small particulates, particle pollution, light, temperature, humidity, and so on. Thus, environmental sensing introduces privacy-performance trade-off Candanedo and Feldheim (2016b), Huang et al. (2009), Jin (2016) that involves mitigating an invasion of occupants privacy and ensuring the accuracy of occupancy detection.
A critical element of regulating the levels of occupancy comfort and energy performance in environmental sensing is a clear understanding of the complex interaction between occupants and their indoor environment. Most environmental parameters are sensitive enough to reveal occupants’ presence and subtle changes in their behavior. But this considerable variation in sensor data makes it challenging to create a proper measure of occupancy detection. The present study is an attempt to address this issue from a cognitive ecology perspective Marewski and Schooler (2011), Heft (2013). Cognitive ecology puts forward an “enactive” approach to data processing (Palacios and Bozinovic, 2003) : cognition involves an active transformation of sensor data into meaningful relationships between occupants and their environment. This approach seems promising compared to a passive interpretation of internally represented data, but requires a measure highly adaptive to evolving requirements Amato et al. (2015), Dragone et al. (2015). Consequently, the objective of present study is to propose a measure for accurate occupancy detection that would be both robust and easily interpretable in an unstable environment.
Logistic regression (LR) seems to be the proper measure due to having lower bias, making fewer assumptions in comparison with other linear classifiers, and deeper understanding the role of predictors Donnelly and Verkuilen (2017), Hastie et al. (2013), McCulloch et al. (2009). These advantages help to guarantee remarkable performance in a wide range of practical applications de Menezes et al. (2017), Donnelly and Verkuilen (2017), Hastie et al. (2013), Hosmer and Lemeshow (2000), Li et al. (2012), Kulikovskikh (2017). However, they may be easily outweighed due to complete separation of classes Candanedo and Feldheim (2016b), Ding and Gentleman (2005), Firth (1992a), Firth (1992b), Fort and Lambert-Lacroix (2005), Gelman et al. (2008), Heinze and Schemper (2002), Park and Hastie (2007).
The problem of separation primarily arises in small datasets with several unbalanced and highly predictive features. Moreover, the phenomena of separation may also occur with small to medium-sized datasets when at least one LR parameter is infinite even if the likelihood converges Candanedo and Feldheim (2016b), Firth (1992a), Firth (1992b), Gelman et al. (2008), Heinze and Schemper (2002). This means that classes can be perfectly separated by a single feature or by a non-trivial linear combination of features. Finally, the problem of separation may arise if the underlying model parameters are low in an absolute value (Heinze and Schemper, 2002).
Previous research suggests a number of solutions to the problem of separation such as adopting partial least squares Ding and Gentleman (2005), Fort and Lambert-Lacroix (2005) and iteratively reweighted least squares (Firth, 1992b). Another approach to deal with the separable data consists in penalizing the maximum likelihood Hastie et al. (2013), Fort and Lambert-Lacroix (2005). An alternative solution is to apply prior distributions to the likelihood function as suggested in Hastie et al. (2013), Firth (1992a), Gelman et al. (2008). In particular, Jeffreys prior distribution (Firth) Firth (1992a), Firth (1992b), Hastie et al. (2013), Heinze and Schemper (2002), Park and Hastie (2007) that is developed to reduce the bias of maximum likelihood estimates in generalized linear models has been shown to provide an ideal solution to separation. However, in spite of reliable computational results, these estimates are not clearly interpretable as prior information in a regression context (Gelman et al., 2008). Donnelly and Verkuilen (2017) also highlighted the problem of proper interpretability observing complete or a lack of separability in context of floor or ceiling effects Hastie et al. (2013), McCulloch et al. (2009). The logit transformation permits to move proportions away from the ceiling or floor by adding half a success and half a failure. Even if empirical logit analysis helps to cope with convergence issues, it little addresses the real problem: the estimates of model parameters from logistic regression and empirical logit analysis rest on different assumptions (Donnelly and Verkuilen, 2017). So, the model based on the empirical logit function should be interpreted cautiously.
This study looks at the separability problem from a cognitive point of view. It helps to propose an extension of LR that looks appropriate to handle both stability and interpretability issues. The logic lying behind this extension is discussed at greater length in the next section.
Section snippets
Problem statement
Let be independent and identically distributed observations with responses . The matrix can be presented either as , with vectors of predictors , or as , with vectors of features . Let denote the response vector, i.e. . Then, for any vector of regression coefficients LR models the class conditional probabilities by
Problem 1 Let denote the link
Results
This section describes the results of computational experiments conducted to test the stability and interpretability of the proposed measure on the described datasets.
The dataset was divided into the training subset and the validation subset using 5-fold cross validation. To increase a chance of identifying the separation problem, the experiments suggested varying the limited number of observations. As for small to moderate sample sizes the resampling estimates are better than the
Limitations and future directions
There are two main limitations that need to be addressed regarding this research. The proposed measure was tested: (1) on the same sensor data and (2) only under varying the number of observations and predictors. Thus, the following directions for future research may be suggested. First, it would be reliable to significantly extend a number of datasets to support the findings of the present study. Second, it seems interesting to model different evolving requirements such as varying the level of
Conclusions
The present research was aimed at proposing the reliable measure for accurate occupancy detection in the presence of variation in sensor data. For this purpose, a cognitive validation map based on the extension of logistic regression was proposed. The LR model was extended with regard to guessing and forgetting factors adopted from a latent trait theory. The results of computational experiments proved the validity of proposed measure and revealed its benefits in case of fewer observations or
Acknowledgments
This work was supported by the Ministry of Education and Science of the Russian Federation, grant 074-U01. The author would like to thank the reviewers for the valuable comments and suggestions and to express the gratitude to Dr. Sergej Prokhorov for a constructive discussion on further research.
References (36)
- et al.
Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models
Energy Build.
(2016) - et al.
A fusion framework for occupancy estimation in office buildings based on environmental sensor data
Energy Build.
(2016) - et al.
Leveraging a predictive model of the workload for intelligent slot allocation schemes in energy-efficient HPC clusters
Eng. Appl. Artif. Intell.
(2016) Data classification with binary response through the Boosting algorithm and logistic regression
Expert Syst. Appl.
(2017)Structure and classification of unified energy agents as a base for the systematic development of future energy grids
Eng. Appl. Artif. Intell.
(2015)- et al.
Empirical logit analysis is not logistic regression
J. Mem. Lang.
(2017) A cognitive robotic ecology approach to self-configuring and evolving AAL systems
Eng. Appl. Artif. Intell.
(2015)- et al.
Numerical stability improvements of state-value function approximations based on RLS learning for online HDP-DLQR control system design
Eng. Appl. Artif. Intell.
(2017) - et al.
Impact of climate change on comfort and energy performance in offices
Build. Environ.
(2012) Robotic ubiquitous cognitive ecology for smart homes
J. Intell. Robot. Syst.
(2015)
Some latent trait models and their use in inferring an examinee’s ability
Classification using generalized partial least squares
J. Comput. Graph. Statist.
Bias reduction, the Jeffreys prior and GLIM
Generalized linear models and Jeffreys priors: An iterative weighted least-squares approach
Classification using partial least squares with penalized logistic regression
Bioinformatics
A weakly informative default prior distribution for logistic and other regression models
Ann. Appl. Stat.
Cited by (3)
Optimization-enabled deep stacked autoencoder for occupancy detection
2021, Social Network Analysis and MiningNeurons learn slower than they think
2021, arXivPsychological perspectives on implicit regularization: A model of retrieval-induced forgetting (RIF)
2018, Journal of Physics: Conference Series