Skip to main content

RECol: Reconstruction Error Columns for Outlier Detection

  • Conference paper
  • First Online:
KI 2023: Advances in Artificial Intelligence (KI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14236))

Included in the following conference series:

  • 590 Accesses

Abstract

Detecting outliers or anomalies is a common data analysis task. As a sub-field of unsupervised machine learning, a large variety of approaches exist, but the vast majority treats the input features as independent and often fails to recognize even simple (linear) relationships in the input feature space. Hence, we introduce RECol, a generic data pre-processing approach to generate additional columns (features) in a leave-one-out fashion: For each column, we try to predict its values based on the other columns, generating reconstruction error columns. We run experiments across a large variety of common baseline approaches and benchmark datasets with and without our RECol pre-processing method. From the more than 88k experiments, we conclude that the generated reconstruction error feature space generally seems to support common outlier detection methods and often considerably improves their ROC-AUC and PR-AUC values. Further, we provide parameter recommendations, such as starting with a simple squared error based random forest regression to generate RECols for new practical use-cases.

This paper represents the authors’ personal opinions and does not necessarily reflect the views of the Deutsche Bundesbank, the Eurosystem or their staff.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This can only harm us, because we might miss many possible experiments that could outperform the baseline.

  2. 2.

    These choices can only harm us in that we restrict ourselves to fewer options compared to the baseline results.

  3. 3.

    Result and code available at https://github.com/DayanandVH/RECol.

References

  1. Aggarwal, C.C.: Outlier Analysis. In: Aggarwal, C.C. (ed.) Data Mining, pp. 237–263. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8_8

    Chapter  Google Scholar 

  2. Amarbayasgalan, T., Park, K.H., Lee, J.Y., Ryu, K.H.: Reconstruction error based deep neural networks for coronary heart disease risk prediction. PLoS ONE 14(12), 1–17 (2019). https://doi.org/10.1371/journal.pone.0225991

    Article  Google Scholar 

  3. Amer, M., Goldstein, M.: Nearest-neighbor and clustering based anomaly detection algorithms for rapidminer. In: Fischer, S., Mierswa, I. (eds.) Proceedings of the 3rd RapidMiner Community Meeting and Conferernce (RCOMM 2012). RapidMiner Community Meeting and Conference (RCOMM-2012), 28–31 August, Budapest, Hungary, pp. 1–12. Shaker Verlag GmbH (2012)

    Google Scholar 

  4. Amer, M., Goldstein, M., Abdennadher, S.: Enhancing one-class support vector machines for unsupervised anomaly detection. In: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, ODD 2013, pp. 8–15. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2500853.2500857

  5. Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  6. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. SIGMOD Rec. 29(2), 93–104 (2000). https://doi.org/10.1145/335191.335388

    Article  Google Scholar 

  7. Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: a survey. CoRR abs/1901.03407 (2019). http://arxiv.org/abs/1901.03407

  8. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009). https://doi.org/10.1145/1541880.1541882

  9. Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.M.: The Amsterdam library of object images. Int. J. Comput. Vision 61(1), 103–112 (2005). https://doi.org/10.1023/B:VISI.0000042993.50813.60

    Article  Google Scholar 

  10. Gogoi, P., Bhattacharyya, D., Borah, B., Kalita, J.K.: A survey of outlier detection methods in network anomaly identification. Comput. J. 54(4), 570–588 (2011). https://doi.org/10.1093/comjnl/bxr026

    Article  Google Scholar 

  11. Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. In: Wölfl, S. (ed.) KI-2012: Poster and Demo Track. German Conference on Artificial Intelligence (KI-2012), 24–27 September, Saarbrücken, Germany, pp. 59–63. Online (2012)

    Google Scholar 

  12. Goldstein, M., Uchida, S.: A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 11(4), 1–31 (2016). https://doi.org/10.1371/journal.pone.0152173

    Article  Google Scholar 

  13. Gong, D., et al.: Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection. arXiv e-prints arXiv:1904.02639 (2019)

  14. He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Pattern Recogn. Lett. 24(9–10), 1641–1650 (2003). https://doi.org/10.1016/S0167-8655(03)00003-5

    Article  MATH  Google Scholar 

  15. Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2004). https://doi.org/10.1023/B:AIRE.0000045502.10941.a9

    Article  MATH  Google Scholar 

  16. Huang, Y.A., Fan, W., Lee, W., Yu, P.S.: Cross-feature analysis for detecting ad-hoc routing anomalies. In: Proceedings of the 23rd International Conference on Distributed Computing Systems, ICDCS 2003, p. 478. IEEE Computer Society, USA (2003). https://doi.org/10.1109/ICDCS.2003.1203498

  17. Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Loop: local outlier probabilities. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 1649–1652. Association for Computing Machinery, New York (2009). https://doi.org/10.1145/1645953.1646195

  18. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 413–422. IEEE Computer Society, USA (2008). https://doi.org/10.1109/ICDM.2008.17

  19. Ma, M.Q., Zhao, Y., Zhang, X., Akoglu, L.: The need for unsupervised outlier model selection: a review and evaluation of internal evaluation strategies. ACM SIGKDD Explor. Newsl. 25(1) (2023)

    Google Scholar 

  20. Micenková, B., McWilliams, B., Assent, I.: Learning outlier ensembles: the best of both worlds-supervised and unsupervised. In: ACM SIGKDD 2014 Workshop ODD (2014)

    Google Scholar 

  21. Noto, K., Brodley, C., Slonim, D.: FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min. Knowl. Discov. 25(1), 109–133 (2012). https://doi.org/10.1007/s10618-011-0234-x

    Article  MathSciNet  Google Scholar 

  22. Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. 54(2) (2021). https://doi.org/10.1145/3439950

  23. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. SIGMOD Rec. 29(2), 427–438 (2000). https://doi.org/10.1145/335191.335437

    Article  Google Scholar 

  24. Sakurada, M., Yairi, T.: Anomaly detection using autoencoders with nonlinear dimensionality reduction. In: Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, MLSDA 2014, pp. 4–11. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2689746.2689747

  25. Sattarov, T., Herurkar, D., Hees, J.: Explaining anomalies using denoising autoencoders for financial tabular data (2022)

    Google Scholar 

  26. Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001). https://doi.org/10.1162/089976601750264965

    Article  MATH  Google Scholar 

  27. Schreyer, M., Sattarov, T., Schulze, C., Reimer, B., Borth, D.: Detection of accounting anomalies in the latent space using adversarial autoencoder neural networks (2019). https://doi.org/10.48550/ARXIV.1908.00734. https://arxiv.org/abs/1908.00734

  28. Ted, E., et al.: Detecting insider threats in a real corporate database of computer usage activity. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1393–1401. Association for Computing Machinery (2013). https://doi.org/10.1145/2487575.2488213

  29. Xia, Y., Cao, X., Wen, F., Hua, G., Sun, J.: Learning discriminative reconstructions for unsupervised outlier removal. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015, pp. 1511–1519. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.177

  30. Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 665–674. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3097983.3098052

Download references

Acknowledgements

This work was supported by the BMWK project EuroDaT (Grant 68GX21010K) and XAINES (Grant 01IW20005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dayananda Herurkar .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 345 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Herurkar, D., Meier, M., Hees, J. (2023). RECol: Reconstruction Error Columns for Outlier Detection. In: Seipel, D., Steen, A. (eds) KI 2023: Advances in Artificial Intelligence. KI 2023. Lecture Notes in Computer Science(), vol 14236. Springer, Cham. https://doi.org/10.1007/978-3-031-42608-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42608-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42607-0

  • Online ISBN: 978-3-031-42608-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics