skip to main content
10.1145/3338840.3355641acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

Outlier detection using isolation forest and local outlier factor

Authors Info & Claims
Published:24 September 2019Publication History

ABSTRACT

Outlier detection, also named as anomaly detection, is one of the hot issues in the field of data mining. As well-known outlier detection algorithms, Isolation Forest(iForest) and Local Outlier Factor(LOF) have been widely used. However, iForest is only sensitive to global outliers, and is weak in dealing with local outliers. Although LOF performs well in local outlier detection, it has high time complexity. To overcome the weaknesses of iForest and LOF, a two-layer progressive ensemble method for outlier detection is proposed. It can accurately detect outliers in complex datasets with low time complexity. This method first utilizes iForest with low complexity to quickly scan the dataset, prunes the apparently normal data, and generates an outlier candidate set. In order to further improve the pruning accuracy, the outlier coefficient is introduced to design a pruning threshold setting method, which is based on outlier degree of data. Then LOF is applied to further distinguish the outlier candidate set and get more accurate outliers. The proposed ensemble method takes advantage of the two algorithms and concentrates valuable computing resources on the key stage. Finally, a large number of experiments are carried out to verify the ensemble method. The results show that compared with the existing methods, the ensemble method can significantly improve the outlier detection rate and greatly reduce the time complexity.

References

  1. Jorge Edmundo Alpuche Aviles, Maria Isabel Cordero Marcos, David Sasaki, Keith Sutherland, Bill Kane, and Esa Kuusela. 2018. Creation of knowledge-based planning models intended for large scale distribution: Minimizing the effect of outlier plans. Journal of applied clinical medical physics 19, 3 (2018), 215--226.Google ScholarGoogle ScholarCross RefCross Ref
  2. Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. 2000. LOF: identifying density-based local outliers. In ACM sigmod record, Vol. 29. ACM, 93--104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D Dua and E Karra Taniskidou. 2017. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California. School of Information and Computer Science (2017).Google ScholarGoogle Scholar
  4. Jakub Dvořák and Petr Savickỳ. 2007. Softening splits in decision trees using simulated annealing. In International Conference on Adaptive and Natural Computing Algorithms. Springer, 721--729.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sarah Erfani, Mahsa Baktashmotlagh, Sutharshan Rajasegarar, Shanika Karunasekera, and Chris Leckie. 2015. R1SVM: A randomised nonlinear approach to large-scale anomaly detection. (2015).Google ScholarGoogle Scholar
  6. Shalmoli Gupta, Ravi Kumar, Kefu Lu, Benjamin Moseley, and Sergei Vassilvitskii. 2017. Local search methods for k-means with outliers. Proceedings of the VLDB Endowment 10, 7 (2017), 757--768.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Riyaz Ahamed Ariyaluran Habeeb, Fariza Nasaruddin, Abdullah Gani, Ibrahim Abaker Targio Hashem, Ejaz Ahmed, and Muhammad Imran. 2018. Real-time big data processing for anomaly detection: a survey. International Journal of Information Management (2018).Google ScholarGoogle Scholar
  8. Raihan Ul Islam, Mohammad Shahadat Hossain, and Karl Andersson. 2018. A novel anomaly detection algorithm for sensor data under uncertainty. Soft Computing 22, 5 (2018), 1623--1639.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Liefa Liao and Bin Luo. 2018. Entropy Isolation Forest Based on Dimension Entropy for Anomaly Detection. In International Symposium on Intelligence Computation and Applications. Springer, 365--376.Google ScholarGoogle Scholar
  10. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2012. Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6, 1 (2012), 3.Google ScholarGoogle Scholar
  11. Zhaoli Liu, Tao Qin, Xiaohong Guan, Hezhi Jiang, and Chenxu Wang. 2018. An integrated method for anomaly detection from massive system logs. IEEE Access 6 (2018), 30602--30611.Google ScholarGoogle ScholarCross RefCross Ref
  12. Khaled Ali Othman, Md Nasir Sulaiman, Norwati Mustapha, and Nurfadhlina Mohd Sharef. 2017. Local Outlier Factor in Rough K-Means Clustering. PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY 25 (2017), 211--222.Google ScholarGoogle Scholar
  13. Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2018. Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2041--2050.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Guillaume Staerman, Pavlo Mozharovskyi, Stephan Clémençon, and Florence d'Alché Buc. 2019. Functional Isolation Forest. arXiv preprint arXiv:1904.04573 (2019).Google ScholarGoogle Scholar
  15. Jialing Tang and Henry YT Ngan. 2016. Traffic outlier detection by density-based bounded local outlier factors. Information Technology in Industry 4, 1 (2016), 6.Google ScholarGoogle Scholar
  16. Xian Teng, Muheng Yan, Ali Mert Ertugrul, and Yu-Ru Lin. 2018. Deep into Hypersphere: Robust and Unsupervised Anomaly Discovery in Dynamic Networks.. In IJCAI. 2724--2730.Google ScholarGoogle Scholar
  17. Bing Tu, Chengle Zhou, Wenlan Kuang, Longyuan Guo, and Xianfeng Ou. 2018. Hyperspectral imagery noisy label detection by spectral angle local outlier factor. IEEE Geoscience and Remote Sensing Letters 15, 9 (2018), 1417--1421.Google ScholarGoogle ScholarCross RefCross Ref
  18. Prabha Verma, Prashant Singh, and RDS Yadava. 2017. Fuzzy c-means clustering based outlier detection for SAW electronic nose. In 2017 2nd international conference for convergence in technology (I2CT). IEEE, 513--519.Google ScholarGoogle ScholarCross RefCross Ref
  19. Yizhou Yan, Lei Cao, and Elke A Rundensteiner. 2017. Scalable top-n local outlier detection. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1235--1244.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Outlier detection using isolation forest and local outlier factor

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent Systems
      September 2019
      323 pages
      ISBN:9781450368438
      DOI:10.1145/3338840
      • Conference Chair:
      • Chih-Cheng Hung,
      • General Chair:
      • Qianbin Chen,
      • Program Chairs:
      • Xianzhong Xie,
      • Christian Esposito,
      • Jun Huang,
      • Juw Won Park,
      • Qinghua Zhang

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 September 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      RACS '19 Paper Acceptance Rate56of188submissions,30%Overall Acceptance Rate393of1,581submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader