A Transductive Forest for Anomaly Detection with Few Labels

Zhang, Jingrui; Pham, Ninh; Dobbie, Gillian

doi:10.1007/978-3-031-43412-9_17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14169))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1242 Accesses

Abstract

Extensive labeled training data for anomaly detection is enormously expensive and often unavailable in data-sensitive applications due to privacy constraints. We propose TransForest, a transductive forest for anomaly detection, in the semi-supervised setting where few labels are available. Guided by little label information, TransForest pushes classification boundaries toward sensitive areas where abnormal and normal points are located, increasing learning capacity. Empirically, TransForest is competitive with other unsupervised and semi-supervised representative detectors given a small number of labeled points. TransForest also offers a feature importance ranking consistent with the rankings provided by popular supervised forests on low-dimensional data sets. Our code is available at https://github.com/jzha968/transForest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/Minqi824/ADBench/tree/main/datasets.

References

Aggarwal, C.C.: Outlier Analysis. Springer (2013). https://doi.org/10.1007/978-3-319-47578-3
Bercea, C.I., Wiestler, B., Rueckert, D., Albarqouni, S.: Federated disentangled representation learning for unsupervised brain anomaly detection. Nat. Mach. Intell. 4(8), 685–695 (2022)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD, pp. 93–104 (2000)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: KDD, pp. 785–794 (2016)
Google Scholar
Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends Comput. Graph. Vis. 7(2–3), 81–227 (2012)
MATH Google Scholar
Dou, Q., et al.: Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study. npj Digit. Med. 4(60) (2021)
Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learn. 63(1), 3–42 (2006)
Article MATH Google Scholar
Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm. In: KI-2012: Poster And Demo Track, vol. 9 (2012)
Google Scholar
Gopalan, P., Sharan, V., Wieder, U.: PIDForest: anomaly detection via partial identification. In: NeurIPS, pp. 15783–15793 (2019)
Google Scholar
Guha, S., Mishra, N., Roy, G., Schrijvers, O.: Robust random cut forest based anomaly detection on streams. In: ICML, pp. 2712–2721 (2016)
Google Scholar
Han, S., Hu, X., Huang, H., Jiang, M., Zhao, Y.: ADBench: anomaly detection benchmark. In: NeurIPS (2022)
Google Scholar
Keller, F., Müller, E., Böhm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: ICDE, pp. 1037–1048 (2012)
Google Scholar
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: PAKDD, pp. 831–838 (2009)
Google Scholar
Li, Z., Zhao, Y., Botta, N., Ionescu, C., Hu, X.: COPOD: copula-based outlier detection. In: ICDM, pp. 1118–1123 (2020)
Google Scholar
Li, Z., Zhao, Y., Hu, X., Botta, N., Ionescu, C., Chen, G.: ECOD: unsupervised outlier detection using empirical cumulative distribution functions. In: TKDE, pp. 1–1 (2022)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: ICDM, pp. 413–422 (2008)
Google Scholar
Manevitz, L.M., Yousef, M.: One-class svms for document classification. J. Mach. Learn. Res. 2, 139–154 (2001)
MATH Google Scholar
Marteau, P.F., Soheily-Khah, S., Béchet, N.: Hybrid isolation forest-application to intrusion detection. arXiv preprint arXiv:1705.03800 (2017)
Pang, G., Cao, L., Chen, L., Liu, H.: Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In: KDD, pp. 2041–2050 (2018)
Google Scholar
Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. (CSUR) 54(2), 1–38 (2021)
Article Google Scholar
Pang, G., Shen, C., van den Hengel, A.: Deep anomaly detection with deviation networks. In: KDD, pp. 353–362 (2019)
Google Scholar
Pevnỳ, T.: LODA: lightweight on-line detector of anomalies. Mach. Learn. 102(2), 275–304 (2016)
Article MathSciNet MATH Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD, pp. 427–438 (2000)
Google Scholar
Ruff, L., et al.: A unifying review of deep and shallow anomaly detection. In: Proceedings of the IEEE (2021)
Google Scholar
Ruff, L., et al.: Deep semi-supervised anomaly detection. In: ICLR (2020)
Google Scholar
Sathe, S., Aggarwal, C.C.: Subspace histograms for outlier detection in linear time. Knowl. Inf. Syst. 56(3), 691–715 (2018)
Article Google Scholar
Schubert, E., Zimek, A., Kriegel, H.: Generalized outlier detection with flexible kernel density estimates. In: SDM, pp. 542–550 (2014)
Google Scholar
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley Series in Probability and Statistics. Wiley (1992)
Google Scholar
Zhao, Y., Hryniewicki, M.K.: XGBOD: improving supervised outlier detection with unsupervised representation learning. In: IJCNN, pp. 1–8 (2018)
Google Scholar
Zhao, Y., Nasrullah, Z., Li, Z.: PyOD: a python toolbox for scalable outlier detection. JMLR 20, 1–7 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Auckland, Auckland, New Zealand
Jingrui Zhang, Ninh Pham & Gillian Dobbie

Authors

Jingrui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ninh Pham
View author publications
You can also search for this author in PubMed Google Scholar
Gillian Dobbie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ninh Pham .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Ethical Statement

Since we propose a new learning model for anomaly detection, there are no ethical issues.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Pham, N., Dobbie, G. (2023). A Transductive Forest for Anomaly Detection with Few Labels. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-43412-9_17
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43411-2
Online ISBN: 978-3-031-43412-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

A Transductive Forest for Anomaly Detection with Few Labels