Cognitive validation map for early occupancy detection in environmental sensing

https://doi.org/10.1016/j.engappai.2017.08.008Get rights and content

Highlights

  • An occupancy detection measure based on a cognitive validation map easier adapts to evolving requirements.

  • A cognitive validation map ensures better accuracy on sensor data with fewer observations or more predictors.

  • To create a cognitive validation map, logistic regression was extended with regard to guessing and forgetting factors adopted from a latent trait theory.

Abstract

Most environmental parameters are clearly indicative of occupants’ presence and subtle changes in their behavior. However, this variation in sensor data makes it challenging to create a proper measure of occupancy detection that is both robust and clearly interpretable in an unstable environment. The present study addresses this problem from a cognitive ecology perspective proposing a cognitive validation map. This map is based on the extension of logistic regression that involves two extra parameters — forgetting and guessing factors. The mutual regulation of these factors creates a unique cognitive validation map that adapts the measure to evolving requirements in environmental sensing. The results of computational experiments on the proposed measure demonstrated better occupancy detection under more unstable conditions: on sensor data with fewer observations or more predictors. For this reason, the measure based on a cognitive validation map seems promising in early occupancy detection problems, but may be readily extended to a broader range of practical applications.

Introduction

An extensive body of literature exists on the issue of occupancy detection which is based on intellectual control systems Candanedo and Feldheim (2016b), Chen et al. (2016), Cocana-Fernandez et al. (2016), Ferreira et al. (2017), Hailemariam et al. (2011), Lam et al. (2009). Different types of heating, lighting, air conditioning (HVAC) systems specifically address the problem of energy consumption and security protection Candanedo and Feldheim (2016b), Derksen et al. (2015), Erickson et al. (2010), Roetzel and Tsangrassoulis (2012), Yang et al. (2012). The efficiency of the solution to this problem greatly depends on the methods of occupancy detection.

These methods may be classified into direct and indirect Candanedo and Feldheim (2016b), Huang et al. (2009), Jin (2016). Direct methods prioritize detection accuracy over occupants’ privacy employing vision- and tag-based systems. Indirect methods, in contrast, focus on less intrusive sensors: passive infrared (PIR), pressure sensors, and environmental sensors to measure carbon monoxide (CO), volatile organic compounds, small particulates, particle pollution, light, temperature, humidity, and so on. Thus, environmental sensing introduces privacy-performance trade-off Candanedo and Feldheim (2016b), Huang et al. (2009), Jin (2016) that involves mitigating an invasion of occupants privacy and ensuring the accuracy of occupancy detection.

A critical element of regulating the levels of occupancy comfort and energy performance in environmental sensing is a clear understanding of the complex interaction between occupants and their indoor environment. Most environmental parameters are sensitive enough to reveal occupants’ presence and subtle changes in their behavior. But this considerable variation in sensor data makes it challenging to create a proper measure of occupancy detection. The present study is an attempt to address this issue from a cognitive ecology perspective Marewski and Schooler (2011), Heft (2013). Cognitive ecology puts forward an “enactive” approach to data processing (Palacios and Bozinovic, 2003) : cognition involves an active transformation of sensor data into meaningful relationships between occupants and their environment. This approach seems promising compared to a passive interpretation of internally represented data, but requires a measure highly adaptive to evolving requirements Amato et al. (2015), Dragone et al. (2015). Consequently, the objective of present study is to propose a measure for accurate occupancy detection that would be both robust and easily interpretable in an unstable environment.

Logistic regression (LR) seems to be the proper measure due to having lower bias, making fewer assumptions in comparison with other linear classifiers, and deeper understanding the role of predictors Donnelly and Verkuilen (2017), Hastie et al. (2013), McCulloch et al. (2009). These advantages help to guarantee remarkable performance in a wide range of practical applications de Menezes et al. (2017), Donnelly and Verkuilen (2017), Hastie et al. (2013), Hosmer and Lemeshow (2000), Li et al. (2012), Kulikovskikh (2017). However, they may be easily outweighed due to complete separation of classes Candanedo and Feldheim (2016b), Ding and Gentleman (2005), Firth (1992a), Firth (1992b), Fort and Lambert-Lacroix (2005), Gelman et al. (2008), Heinze and Schemper (2002), Park and Hastie (2007).

The problem of separation primarily arises in small datasets with several unbalanced and highly predictive features. Moreover, the phenomena of separation may also occur with small to medium-sized datasets when at least one LR parameter is infinite even if the likelihood converges Candanedo and Feldheim (2016b), Firth (1992a), Firth (1992b), Gelman et al. (2008), Heinze and Schemper (2002). This means that classes can be perfectly separated by a single feature or by a non-trivial linear combination of features. Finally, the problem of separation may arise if the underlying model parameters are low in an absolute value (Heinze and Schemper, 2002).

Previous research suggests a number of solutions to the problem of separation such as adopting partial least squares Ding and Gentleman (2005), Fort and Lambert-Lacroix (2005) and iteratively reweighted least squares (Firth, 1992b). Another approach to deal with the separable data consists in penalizing the maximum likelihood Hastie et al. (2013), Fort and Lambert-Lacroix (2005). An alternative solution is to apply prior distributions to the likelihood function as suggested in Hastie et al. (2013), Firth (1992a), Gelman et al. (2008). In particular, Jeffreys prior distribution (Firth) Firth (1992a), Firth (1992b), Hastie et al. (2013), Heinze and Schemper (2002), Park and Hastie (2007) that is developed to reduce the bias of maximum likelihood estimates in generalized linear models has been shown to provide an ideal solution to separation. However, in spite of reliable computational results, these estimates are not clearly interpretable as prior information in a regression context (Gelman et al., 2008). Donnelly and Verkuilen (2017) also highlighted the problem of proper interpretability observing complete or a lack of separability in context of floor or ceiling effects Hastie et al. (2013), McCulloch et al. (2009). The logit transformation permits to move proportions away from the ceiling or floor by adding half a success and half a failure. Even if empirical logit analysis helps to cope with convergence issues, it little addresses the real problem: the estimates of model parameters from logistic regression and empirical logit analysis rest on different assumptions (Donnelly and Verkuilen, 2017). So, the model based on the empirical logit function should be interpreted cautiously.

This study looks at the separability problem from a cognitive point of view. It helps to propose an extension of LR that looks appropriate to handle both stability and interpretability issues. The logic lying behind this extension is discussed at greater length in the next section.

Section snippets

Problem statement

Let (xi,yi)i=1m be independent and identically distributed observations with responses yi{0,1}. The matrix XRm×n can be presented either as X=[x1,,xn]T, with vectors of predictors xiRn, or as X=[x1,,xm], with vectors of features xjRm. Let y denote the response vector, i.e. y=[y1,,yn]T. Then, for any vector θRn of regression coefficients LR models the class conditional probabilities p(xi,θ)=P(yi=1|xi,θ) by lnp(xi,θ)1p(xi,θ)=θTxi.

Problem 1

Let g denote the link

Results

This section describes the results of computational experiments conducted to test the stability and interpretability of the proposed measure on the described datasets.

The dataset Xm was divided into the training subset Xl1 and the validation subset Xl2 using 5-fold cross validation. To increase a chance of identifying the separation problem, the experiments suggested varying the limited number of observations. As for small to moderate sample sizes the resampling estimates are better than the

Limitations and future directions

There are two main limitations that need to be addressed regarding this research. The proposed measure was tested: (1) on the same sensor data and (2) only under varying the number of observations and predictors. Thus, the following directions for future research may be suggested. First, it would be reliable to significantly extend a number of datasets to support the findings of the present study. Second, it seems interesting to model different evolving requirements such as varying the level of

Conclusions

The present research was aimed at proposing the reliable measure for accurate occupancy detection in the presence of variation in sensor data. For this purpose, a cognitive validation map based on the extension of logistic regression was proposed. The LR model was extended with regard to guessing and forgetting factors adopted from a latent trait theory. The results of computational experiments proved the validity of proposed measure and revealed its benefits in case of fewer observations or

Acknowledgments

This work was supported by the Ministry of Education and Science of the Russian Federation, grant 074-U01. The author would like to thank the reviewers for the valuable comments and suggestions and to express the gratitude to Dr. Sergej Prokhorov for a constructive discussion on further research.

References (36)

  • BirnbaumA.

    Some latent trait models and their use in inferring an examinee’s ability

  • Candanedo, L.M., Feldheim, V., 2016. Occupancy Detection. UCI Machine Learning Repository....
  • DingB. et al.

    Classification using generalized partial least squares

    J. Comput. Graph. Statist.

    (2005)
  • Erickson, V.L., Carreira-Perpinan, M.A., Cerpa, A.E., OBSERVE: Occupancy-based system for efficient reduction of HVAC...
  • FirthD.

    Bias reduction, the Jeffreys prior and GLIM

  • FirthD.

    Generalized linear models and Jeffreys priors: An iterative weighted least-squares approach

  • FortG. et al.

    Classification using partial least squares with penalized logistic regression

    Bioinformatics

    (2005)
  • GelmanA. et al.

    A weakly informative default prior distribution for logistic and other regression models

    Ann. Appl. Stat.

    (2008)
  • View full text