Abstract
Pointwise anomaly detection and change detection focus on the study of individual data instances; however, an emerging area of research involves groups or collections of observations. From applications of high-energy particle physics to health care collusion, group deviation detection techniques result in novel research discoveries, mitigation of risks, prevention of malicious collaborative activities, and other interesting explanatory insights. In particular, static group anomaly detection is the process of identifying groups that are not consistent with regular group patterns, while dynamic group change detection assesses significant differences in the state of a group over a period of time. Since both group anomaly detection and group change detection share fundamental ideas, this survey article provides a clearer and deeper understanding of group deviation detection research in static and dynamic situations.
- Claudio Agostini, Eduardo Saavedra, and Manuel Willington. 2011. Collusion on private health insurance coverage in chile. Journal of Competition Law and Economics 7, 1 (2011), 205--240.Google ScholarCross Ref
- Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9 (2008), 1981--2014. Google ScholarDigital Library
- H. Akaike. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 6 (Dec. 1974), 716--723.Google ScholarCross Ref
- Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 1 (1995), 289--300.Google ScholarCross Ref
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (March 2003), 993--1022. Google ScholarDigital Library
- Stephen P. Borgatti, Ajay Mehra, Daniel J. Brass, and Giuseppe Labianca. 2009. Network analysis in the social sciences. Science 323, 5916 (Feb. 2009), 892--895.Google ScholarCross Ref
- Fred H. Borgen and Mark J. Seling. 1978. Uses of discriminant analysis following MANOVA: Multivariate statistics for multivariate purposes.Journal of Applied Psychology 63, 6 (1978), 689.Google Scholar
- George E. P. Box. 1954. Some theorems on quadratic forms applied in the study of analysis of variance problems, II. Effects of inequality of variance and of correlation between errors in the two-way classification. Annals of Mathematical Statistics 25, 3 (1954), 484--498.Google ScholarCross Ref
- Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, and Michael E. Houle. 2016. On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study. Vol. 30. Springer, 891--927. Google ScholarDigital Library
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. Computer Surveys 41, 3, Article 15 (2009), 58 pages. Google ScholarDigital Library
- Xiaofan Chen and Shunzheng Yu. 2016. A collaborative intrusion detection system against DDoS for SDN. IEICE Transactions on Information and Systems 99, 9 (2016), 2395--2399.Google ScholarCross Ref
- Xi C. Chen, Abdullah Mueen, Vijay K. Narayanan, Nikos Karampatziakis, Gagan Bansal, and Vipin Kumar. 2014. Online discovery of group level events in time series. In Proceedings of the 2014 SIAM International Conference on Data Mining, 632--640.Google ScholarCross Ref
- Timothy Costigan. 2005. Bonferroni inequalities and intervals. In Encyclopedia of Biostatistics.Google Scholar
- Hanbo Dai, Feida Zhu, Ee-Peng Lim, and Hwee Hwa Pang. 2012. Detecting extreme rank anomalous collections. In Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM, 883--894.Google ScholarCross Ref
- Donald A. Darling. 1957. The Kolmogorov-Smirnov, Cramer-Von Mises tests. Annals of Mathematical Statistics 28, 4 (1957), 823--838.Google ScholarCross Ref
- Arnaud Doucet and Adam M. Johansen. 2009. A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of Nonlinear Filtering 12, 656--704 (2009), 3.Google Scholar
- Ines Fãrber, Stephan Gũnnemann, Hans-peter Kriegel, Peer Krõger, Emmanuel Mũller, Erich Schubert, Thomas Seidl, and Arthur Zimek. 2010. On Using Class-Labels in Evaluation of Clusterings. Association for Computing Machinery (ACM).Google Scholar
- Ullas Gargi, Rangachar Kasturi, and Susan H. Strayer. 2000. Performance characterization of video-shot-change detection methods. IEEE Transactions on Circuits and Systems for Video Technology 10, 1 (2000), 1--13. Google ScholarDigital Library
- Samuel J. Gershman and David M. Blei. 2012. A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology 56, 1 (2012), 1--12.Google ScholarCross Ref
- Walter R. Gilks, Sylvia Richardson, and David Spiegelhalter. 1995. Markov Chain Monte Carlo in Practice. CRC Press.Google Scholar
- A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. Ch. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101, 23 (2000), e215--e220.Google ScholarCross Ref
- Jorge Guevara, Stephane Canu, and R Hirata. 2015. Support measure data description for group anomaly detection. ODDx3 Workshop on Outlier Definition, Detection, and Description at ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15).Google Scholar
- Manish Gupta, Jing Gao, Charu Aggarwal, Jiawei Han, and Jiawei Gupta. 2013. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering (TKDE’13) 25, 1 (2013), 1--20.Google Scholar
- Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. On clustering validation techniques. Journal of Intelligent Information Systems 17, 2--3 (2001), 107--145. Google ScholarDigital Library
- David V. Hinkley. 1975. On power transformations to symmetry. Biometrika 62, 1 (1975), 101--111.Google ScholarCross Ref
- Victoria J. Hodge and Jim Austin. 2004. A survey of outlier detection methodologies. Artificial Intelligence Review 22 (2004), 2004. Google ScholarDigital Library
- Matthew D. Hoffman, David M. Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference. Journal of Machine Learning Research 14, 1 (2013), 1303--1347. Google ScholarDigital Library
- Anil K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8 (2010), 651--666. Google ScholarDigital Library
- Gordon V. Kass. 1980. An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29, 2 (1980), 119--127.Google ScholarCross Ref
- Mikaela Keller and Samy Bengio. 2005. Theme topic mixture model: A graphical model for document representation. Idiap-Research Report 04-05 (2005).Google Scholar
- Elyor Kodirov, Tao Xiang, Zhenyong Fu, and Shaogang Gong. 2015. Unsupervised domain adaptation for zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 2452--2460. Google ScholarDigital Library
- Steve W. J. Kozlowski and Bradford S. Bell. 2003. Work groups and teams in organizations. In Handbook of Psychology (Vol. 12): Industrial and Organizational Psychology, W. C. Borman, D. R. Ilgen, and R. J. Klimoski (Eds.). New York, Wiley-Blackwell, 333--375.Google Scholar
- Pavel Laskov, Patrick Düssel, Christin Schäfer, and Konrad Rieck. 2005. Learning intrusion detection: Supervised or unsupervised? In Image Analysis and Processing (ICIAP’05), 50--57. Google ScholarDigital Library
- Rainer Lienhart. 2001. Reliable transition detection in videos: A survey and practitioner’s guide. International Journal of Image and Graphics 1, 3 (2001), 469--486.Google ScholarCross Ref
- J. J. A. Moors. 1988. A quantile alternative for kurtosis. The Statistician: Journal of the Institute of Statisticians 37 (1988), 25--32.Google ScholarCross Ref
- Krikamol Muandet and Bernhard Schölkopf. 2013. One-class support measure machines for group anomaly detection. In Conference on Uncertainty in Artificial Intelligence. Google ScholarDigital Library
- Arjun Mukherjee, Bing Liu, Junhui Wang, Natalie Glance, and Nitin Jindal. 2011. Detecting group review spam. In Proceedings of the 20th International Conference Companion on World Wide Web (WWW’11). ACM, New York, 93--94. Google ScholarDigital Library
- Jorge Luis Rivero Pérez and Bernardete Ribeiro. 2016. Attribute learning for network intrusion detection. In International Neural Network Society Conference on Big Data (INNS’16). Springer, 39--49.Google Scholar
- Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2008. Fast collapsed Gibbs sampling for latent Dirichlet allocation. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 569--577. Google ScholarDigital Library
- Jean-François Quessy, Anne-Catherine Favre, Mĺriem Saŕd, and Maryse Champagne. 2011. Statistical inference in Lombard’s smooth-change model. Environmetrics 22, 7 (2011), 882--893.Google ScholarCross Ref
- Jaxk Reeves, Jien Chen, Xiaolan L. Wang, Robert Lund, and Qi Qi Lu. 2007. A review and comparison of changepoint detection techniques for climate data. Journal of Applied Meteorology and Climatology 46, 6 (2007), 900--915.Google ScholarCross Ref
- Douglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn. 2000. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 1--3 (2000), 19--41. Google ScholarDigital Library
- Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. ACM, 851--860. Google ScholarDigital Library
- Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural Computing 13, 7 (July 2001), 1443--1471. Google ScholarDigital Library
- Gideon Schwarz. 1978. Estimating the dimension of a model. Annals of Statistics 6, 2 (1978), 461--464.Google ScholarCross Ref
- Ashbindu Singh. 1989. Review article digital change detection techniques using remotely-sensed data. International Journal of Remote Sensing 10, 6 (1989), 989--1003.Google ScholarCross Ref
- Richard Socher, Milind Ganjoo, Christopher D. Manning, and Andrew Ng. 2013. Zero-shot learning through cross-modal transfer. In Advances in Neural Information Processing Systems. 935--943. Google ScholarDigital Library
- Hossein Soleimani and David J. Miller. 2015. Parsimonious topic models with salient word discovery. IEEE Transactions on Knowledge and Data Engineering 27, 3 (2015), 824--837.Google ScholarDigital Library
- Hossein Soleimani and David J. Miller. 2016. ATD: Anomalous topic discovery in high dimensional discrete data. IEEE Transactions on Knowledge and Data Engineering 28, 9 (Sept. 2016), 2267--2280. Google ScholarDigital Library
- Charles Spearman. 1904. The proof and measurement of association between two things. American Journal of Psychology 15 (1904), 72--101.Google ScholarCross Ref
- Michael Steinbach, Levent Ertöz, and Vipin Kumar. 2004. The challenges of clustering high dimensional data. In New Directions in Statistical Physics. Springer, 273--309.Google Scholar
- Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2005. Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Boston.Google ScholarDigital Library
- David M. J. Tax and Robert P. W. Duin. 2004. Support vector data description. Machine Learning 54, 1 (2004), 45--66. Google ScholarDigital Library
- T. Vatanen, M. Kuusela, E. Malmi, T. Raiko, T. Aaltonen, and Y. Nagai. 2012. Semi-supervised detection of collective anomalies with an application in high energy particle physics. In The 2012 International Joint Conference on Neural Networks (IJCNN’12). 1--8.Google Scholar
- Rand R. Wilcox. 1995. Comparing two independent groups via multiple quantiles. Journal of the Royal Statistical Society: Series D (The Statistician) 44, 1 (1995), 91.Google Scholar
- Rand R. Wilcox and David M. Erceg-Hurn. 2012. Comparing two dependent groups via quantiles. Journal of Applied Statistics 39, 12 (2012), 2655--2664.Google ScholarCross Ref
- Weng-Keen Wong, Andrew Moore, Gregory Cooper, and Michael Wagner. 2002. Rule-based anomaly pattern detection for detecting disease outbreaks. In Proceedings of the 18th National Conference on Artificial Intelligence. MIT Press. Google ScholarDigital Library
- Weng-Keen Wong, Andrew Moore, Gregory Cooper, and Michael Wagner. 2003. WSARE: What’s strange about recent events?Journal of Urban Health: Bulletin of the New York Academy of Medicine 80, Suppl 1 (2003), i66.Google Scholar
- Yongqin Xian, Christoph H. Lampert, Bernt Schiele, and Zeynep Akata. 2017. Zero-shot learning-A comprehensive evaluation of the good, the bad and the ugly. arXiv preprint arXiv:1707.00600 (2017).Google Scholar
- Yao Xie and David Siegmund. 2012. Sequential multi-sensor change-point detection. ArXiv e-prints (July 2012).Google Scholar
- Liang Xiong. 2013. On learning from collective data. In Dissertations, 560.Google Scholar
- Liang Xiong, Barnabás Póczos, and Jeff Schneider. 2011. Group anomaly detection using flexible genre models. In Advances in Neural Information Processing Systems 24. Curran Associates, 1071--1079. Google ScholarDigital Library
- Liang Xiong, Barnabás Póczos, Jeff Schneider, Andrew Connolly, and Jake VanderPlas. 2011. Hierarchical probabilistic models for group anomaly detection. In International Conference on Artificial Intelligence and Statistics (AISTATS’11).Google Scholar
- Rose Yu, Xinran He, and Yan Liu. 2014. GLAD: Group anomaly detection in social media analysis. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, 372--381. Google ScholarDigital Library
- Rose Yu, Huida Qiu, Zhen Wen, Ching Yung Lin, and Yan Liu. 2016. A survey on social media anomaly detection. ArXiv e-prints (Jan. 2016).Google Scholar
Index Terms
- Group Deviation Detection Methods: A Survey
Recommendations
Group Anomaly Detection: Past Notions, Present Insights, and Future Prospects
AbstractAnomaly detection has evolved as a successful research subject in the areas such as bibliometrics, informatics and computer networks including security-based and social networks. Almost all existing anomaly detection techniques have some ...
Group Anomaly Detection Using Deep Generative Models
Machine Learning and Knowledge Discovery in DatabasesAbstractUnlike conventional anomaly detection research that focuses on point anomalies, our goal is to detect anomalous collections of individual data points. In particular, we perform group anomaly detection (GAD) with an emphasis on irregular group ...
Group anomaly detection for spatio-temporal collective behaviour scenarios in smart cities
IWCTS '22: Proceedings of the 15th ACM SIGSPATIAL International Workshop on Computational Transportation ScienceGroup anomaly detection in terms of detecting and predicting abnormal behaviour from entities as a group rather than as an individual, addresses a variety of challenges in spatio-temporal environments like e.g. traffic and transportation systems, smart ...
Comments