RECol: Reconstruction Error Columns for Outlier Detection

Herurkar, Dayananda; Meier, Mario; Hees, Jörn

doi:10.1007/978-3-031-42608-7_6

Dayananda Herurkar^9,10,
Mario Meier¹¹ &
Jörn Hees^9,12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14236))

Included in the following conference series:

German Conference on Artificial Intelligence (Künstliche Intelligenz)

590 Accesses

Abstract

Detecting outliers or anomalies is a common data analysis task. As a sub-field of unsupervised machine learning, a large variety of approaches exist, but the vast majority treats the input features as independent and often fails to recognize even simple (linear) relationships in the input feature space. Hence, we introduce RECol, a generic data pre-processing approach to generate additional columns (features) in a leave-one-out fashion: For each column, we try to predict its values based on the other columns, generating reconstruction error columns. We run experiments across a large variety of common baseline approaches and benchmark datasets with and without our RECol pre-processing method. From the more than 88k experiments, we conclude that the generated reconstruction error feature space generally seems to support common outlier detection methods and often considerably improves their ROC-AUC and PR-AUC values. Further, we provide parameter recommendations, such as starting with a simple squared error based random forest regression to generate RECols for new practical use-cases.

This paper represents the authors’ personal opinions and does not necessarily reflect the views of the Deutsche Bundesbank, the Eurosystem or their staff.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This can only harm us, because we might miss many possible experiments that could outperform the baseline.
2.
These choices can only harm us in that we restrict ourselves to fewer options compared to the baseline results.
3.
Result and code available at https://github.com/DayanandVH/RECol.

References

Aggarwal, C.C.: Outlier Analysis. In: Aggarwal, C.C. (ed.) Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8
Chapter Google Scholar
Amarbayasgalan, T., Park, K.H., Lee, J.Y., Ryu, K.H.: Reconstruction error based deep neural networks for coronary heart disease risk prediction. PLoS ONE 14(12), 1–17 (2019). https://doi.org/10.1371/journal.pone.0225991
Article Google Scholar
Amer, M., Goldstein, M.: Nearest-neighbor and clustering based anomaly detection algorithms for rapidminer. In: Fischer, S., Mierswa, I. (eds.) Proceedings of the 3rd RapidMiner Community Meeting and Conferernce (RCOMM 2012). RapidMiner Community Meeting and Conference (RCOMM-2012), 28–31 August, Budapest, Hungary, pp. 1–12. Shaker Verlag GmbH (2012)
Google Scholar
Amer, M., Goldstein, M., Abdennadher, S.: Enhancing one-class support vector machines for unsupervised anomaly detection. In: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, ODD 2013, pp. 8–15. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2500853.2500857
Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. SIGMOD Rec. 29(2), 93–104 (2000). https://doi.org/10.1145/335191.335388
Article Google Scholar
Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: a survey. CoRR abs/1901.03407 (2019). http://arxiv.org/abs/1901.03407
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009). https://doi.org/10.1145/1541880.1541882
Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.M.: The Amsterdam library of object images. Int. J. Comput. Vision 61(1), 103–112 (2005). https://doi.org/10.1023/B:VISI.0000042993.50813.60
Article Google Scholar
Gogoi, P., Bhattacharyya, D., Borah, B., Kalita, J.K.: A survey of outlier detection methods in network anomaly identification. Comput. J. 54(4), 570–588 (2011). https://doi.org/10.1093/comjnl/bxr026
Article Google Scholar
Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. In: Wölfl, S. (ed.) KI-2012: Poster and Demo Track. German Conference on Artificial Intelligence (KI-2012), 24–27 September, Saarbrücken, Germany, pp. 59–63. Online (2012)
Google Scholar
Goldstein, M., Uchida, S.: A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 11(4), 1–31 (2016). https://doi.org/10.1371/journal.pone.0152173
Article Google Scholar
Gong, D., et al.: Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection. arXiv e-prints arXiv:1904.02639 (2019)
He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9–10), 1641–1650 (2003). https://doi.org/10.1016/S0167-8655(03)00003-5
Article MATH Google Scholar
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2004). https://doi.org/10.1023/B:AIRE.0000045502.10941.a9
Article MATH Google Scholar
Huang, Y.A., Fan, W., Lee, W., Yu, P.S.: Cross-feature analysis for detecting ad-hoc routing anomalies. In: Proceedings of the 23rd International Conference on Distributed Computing Systems, ICDCS 2003, p. 478. IEEE Computer Society, USA (2003). https://doi.org/10.1109/ICDCS.2003.1203498
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Loop: local outlier probabilities. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 1649–1652. Association for Computing Machinery, New York (2009). https://doi.org/10.1145/1645953.1646195
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 413–422. IEEE Computer Society, USA (2008). https://doi.org/10.1109/ICDM.2008.17
Ma, M.Q., Zhao, Y., Zhang, X., Akoglu, L.: The need for unsupervised outlier model selection: a review and evaluation of internal evaluation strategies. ACM SIGKDD Explor. Newsl. 25(1) (2023)
Google Scholar
Micenková, B., McWilliams, B., Assent, I.: Learning outlier ensembles: the best of both worlds-supervised and unsupervised. In: ACM SIGKDD 2014 Workshop ODD (2014)
Google Scholar
Noto, K., Brodley, C., Slonim, D.: FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min. Knowl. Discov. 25(1), 109–133 (2012). https://doi.org/10.1007/s10618-011-0234-x
Article MathSciNet Google Scholar
Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. 54(2) (2021). https://doi.org/10.1145/3439950
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. SIGMOD Rec. 29(2), 427–438 (2000). https://doi.org/10.1145/335191.335437
Article Google Scholar
Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, MLSDA 2014, pp. 4–11. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2689746.2689747
Sattarov, T., Herurkar, D., Hees, J.: Explaining anomalies using denoising autoencoders for financial tabular data (2022)
Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001). https://doi.org/10.1162/089976601750264965
Article MATH Google Scholar
Schreyer, M., Sattarov, T., Schulze, C., Reimer, B., Borth, D.: Detection of accounting anomalies in the latent space using adversarial autoencoder neural networks (2019). https://doi.org/10.48550/ARXIV.1908.00734. https://arxiv.org/abs/1908.00734
Ted, E., et al.: Detecting insider threats in a real corporate database of computer usage activity. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1393–1401. Association for Computing Machinery (2013). https://doi.org/10.1145/2487575.2488213
Xia, Y., Cao, X., Wen, F., Hua, G., Sun, J.: Learning discriminative reconstructions for unsupervised outlier removal. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015, pp. 1511–1519. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.177
Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 665–674. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3097983.3098052

Download references

Acknowledgements

This work was supported by the BMWK project EuroDaT (Grant 68GX21010K) and XAINES (Grant 01IW20005).

Author information

Authors and Affiliations

German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany
Dayananda Herurkar & Jörn Hees
RPTU Kaiserslautern-Landau, Kaiserslautern, Germany
Dayananda Herurkar
Deutsche Bundesbank, Frankfurt, Germany
Mario Meier
Bonn-Rhein-Sieg University of Applied Sciences, St. Augustin, Germany
Jörn Hees

Authors

Dayananda Herurkar
View author publications
You can also search for this author in PubMed Google Scholar
Mario Meier
View author publications
You can also search for this author in PubMed Google Scholar
Jörn Hees
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dayananda Herurkar .

Editor information

Editors and Affiliations

Universität Würzburg, Würzburg, Germany
Dietmar Seipel
University of Greifswald, Greifswald, Germany
Alexander Steen

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 345 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Herurkar, D., Meier, M., Hees, J. (2023). RECol: Reconstruction Error Columns for Outlier Detection. In: Seipel, D., Steen, A. (eds) KI 2023: Advances in Artificial Intelligence. KI 2023. Lecture Notes in Computer Science(), vol 14236. Springer, Cham. https://doi.org/10.1007/978-3-031-42608-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-42608-7_6
Published: 18 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42607-0
Online ISBN: 978-3-031-42608-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RECol: Reconstruction Error Columns for Outlier Detection