research-article

Outlier detection using isolation forest and local outlier factor

Authors:
Zhangyu Cheng

Wuhan University of Technology, Wuhan, China

Wuhan University of Technology, Wuhan, China
View Profile

,
Chengming Zou

Wuhan University of Technology, Wuhan, China

Wuhan University of Technology, Wuhan, China
View Profile

,
Jianwei Dong

People's Hospital of Ningxia Hui Autonomous Region, Ningxia, China

People's Hospital of Ningxia Hui Autonomous Region, Ningxia, China
View Profile

RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent SystemsSeptember 2019Pages 161–168https://doi.org/10.1145/3338840.3355641

Published:24 September 2019Publication History

RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent Systems

Pages 161–168

ABSTRACT

Outlier detection, also named as anomaly detection, is one of the hot issues in the field of data mining. As well-known outlier detection algorithms, Isolation Forest(iForest) and Local Outlier Factor(LOF) have been widely used. However, iForest is only sensitive to global outliers, and is weak in dealing with local outliers. Although LOF performs well in local outlier detection, it has high time complexity. To overcome the weaknesses of iForest and LOF, a two-layer progressive ensemble method for outlier detection is proposed. It can accurately detect outliers in complex datasets with low time complexity. This method first utilizes iForest with low complexity to quickly scan the dataset, prunes the apparently normal data, and generates an outlier candidate set. In order to further improve the pruning accuracy, the outlier coefficient is introduced to design a pruning threshold setting method, which is based on outlier degree of data. Then LOF is applied to further distinguish the outlier candidate set and get more accurate outliers. The proposed ensemble method takes advantage of the two algorithms and concentrates valuable computing resources on the key stage. Finally, a large number of experiments are carried out to verify the ensemble method. The results show that compared with the existing methods, the ensemble method can significantly improve the outlier detection rate and greatly reduce the time complexity.

References

Jorge Edmundo Alpuche Aviles, Maria Isabel Cordero Marcos, David Sasaki, Keith Sutherland, Bill Kane, and Esa Kuusela. 2018. Creation of knowledge-based planning models intended for large scale distribution: Minimizing the effect of outlier plans. Journal of applied clinical medical physics 19, 3 (2018), 215--226.Google ScholarCross Ref
Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. 2000. LOF: identifying density-based local outliers. In ACM sigmod record, Vol. 29. ACM, 93--104.Google ScholarDigital Library
D Dua and E Karra Taniskidou. 2017. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California. School of Information and Computer Science (2017).Google Scholar
Jakub Dvořák and Petr Savickỳ. 2007. Softening splits in decision trees using simulated annealing. In International Conference on Adaptive and Natural Computing Algorithms. Springer, 721--729.Google ScholarDigital Library
Sarah Erfani, Mahsa Baktashmotlagh, Sutharshan Rajasegarar, Shanika Karunasekera, and Chris Leckie. 2015. R1SVM: A randomised nonlinear approach to large-scale anomaly detection. (2015).Google Scholar
Shalmoli Gupta, Ravi Kumar, Kefu Lu, Benjamin Moseley, and Sergei Vassilvitskii. 2017. Local search methods for k-means with outliers. Proceedings of the VLDB Endowment 10, 7 (2017), 757--768.Google ScholarDigital Library
Riyaz Ahamed Ariyaluran Habeeb, Fariza Nasaruddin, Abdullah Gani, Ibrahim Abaker Targio Hashem, Ejaz Ahmed, and Muhammad Imran. 2018. Real-time big data processing for anomaly detection: a survey. International Journal of Information Management (2018).Google Scholar
Raihan Ul Islam, Mohammad Shahadat Hossain, and Karl Andersson. 2018. A novel anomaly detection algorithm for sensor data under uncertainty. Soft Computing 22, 5 (2018), 1623--1639.Google ScholarDigital Library
Liefa Liao and Bin Luo. 2018. Entropy Isolation Forest Based on Dimension Entropy for Anomaly Detection. In International Symposium on Intelligence Computation and Applications. Springer, 365--376.Google Scholar
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2012. Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6, 1 (2012), 3.Google Scholar
Zhaoli Liu, Tao Qin, Xiaohong Guan, Hezhi Jiang, and Chenxu Wang. 2018. An integrated method for anomaly detection from massive system logs. IEEE Access 6 (2018), 30602--30611.Google ScholarCross Ref
Khaled Ali Othman, Md Nasir Sulaiman, Norwati Mustapha, and Nurfadhlina Mohd Sharef. 2017. Local Outlier Factor in Rough K-Means Clustering. PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY 25 (2017), 211--222.Google Scholar
Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2018. Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2041--2050.Google ScholarDigital Library
Guillaume Staerman, Pavlo Mozharovskyi, Stephan Clémençon, and Florence d'Alché Buc. 2019. Functional Isolation Forest. arXiv preprint arXiv:1904.04573 (2019).Google Scholar
Jialing Tang and Henry YT Ngan. 2016. Traffic outlier detection by density-based bounded local outlier factors. Information Technology in Industry 4, 1 (2016), 6.Google Scholar
Xian Teng, Muheng Yan, Ali Mert Ertugrul, and Yu-Ru Lin. 2018. Deep into Hypersphere: Robust and Unsupervised Anomaly Discovery in Dynamic Networks.. In IJCAI. 2724--2730.Google Scholar
Bing Tu, Chengle Zhou, Wenlan Kuang, Longyuan Guo, and Xianfeng Ou. 2018. Hyperspectral imagery noisy label detection by spectral angle local outlier factor. IEEE Geoscience and Remote Sensing Letters 15, 9 (2018), 1417--1421.Google ScholarCross Ref
Prabha Verma, Prashant Singh, and RDS Yadava. 2017. Fuzzy c-means clustering based outlier detection for SAW electronic nose. In 2017 2nd international conference for convergence in technology (I2CT). IEEE, 513--519.Google ScholarCross Ref
Yizhou Yan, Lei Cao, and Elke A Rundensteiner. 2017. Scalable top-n local outlier detection. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1235--1244.Google ScholarDigital Library

Index Terms

Outlier detection using isolation forest and local outlier factor
1. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation

Recommendations

Sparse random projection isolation forest for outlier detection
Highlights
- We analyzed the isolation-forest-based methods’ problem of lacking efficacy in selecting suitable hyperplanes to split data.
Graphical abstract

Display Omitted

Abstract
Isolation Forest has a low computational complexity, hence has been widely applied to detect outliers in large-scale data. However, it suffers from the artifacts caused by the hyperplanes chosen, thereby failing to detect outliers in ...
Read More
A Novel Noise Clustering Based on Local Outlier Factor
Integrated Uncertainty in Knowledge Modelling and Decision Making
Abstract
Reducing the impact of outliers is an essential issue in machine learning, including clustering. There are two main approaches to reducing the impact of outliers: one is to build robust models, and the other is to remove outliers through ...
Read More
Improving Detection Efficiency: Optimizing Block Size in the Local Outlier Factor (LOF) Algorithm
Rough Sets
Abstract
Detecting outliers in data is essential in various fields, such as finance, healthcare, and many other domains with anomalies. Among well-known outlier detection algorithms, Local Outlier Factor (LOF) is widely used for identifying unusual data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent Systems
September 2019
323 pages
ISBN:9781450368438
DOI:10.1145/3338840
Conference Chair:
Chih-Cheng Hung
Kennesaw State University
,
General Chair:
Qianbin Chen
Chongqing University of Posts and Telecommunications, China
,
Program Chairs:
Xianzhong Xie
Chongqing University of Posts and Telecommunications, China
,
Christian Esposito
University of Salerno, Italy
,
Jun Huang
Chongqing University of Posts and Telecommunications, China
,
Juw Won Park
University of Louisville
,
Qinghua Zhang
Chongqing University of Posts and Telecommunications, China
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 September 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ensemble method
isolation forest
local outlier factor
outlier detection(OD)
Qualifiers
- research-article
Conference

Acceptance Rates
RACS '19 Paper Acceptance Rate56of188submissions,30%Overall Acceptance Rate393of1,581submissions,25%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 109
  Total Citations
  View Citations
- 2,850
  Total Downloads
- Downloads (Last 12 months)646
- Downloads (Last 6 weeks)88
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Outlier detection using isolation forest and local outlier factor

RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sparse random projection isolation forest for outlier detection

A Novel Noise Clustering Based on Local Outlier Factor

Improving Detection Efficiency: Optimizing Block Size in the Local Outlier Factor (LOF) Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Outlier detection using isolation forest and local outlier factor

RACS '19: Proceedings of the Conference on Research in Adaptive and Convergent Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sparse random projection isolation forest for outlier detection

A Novel Noise Clustering Based on Local Outlier Factor

Improving Detection Efficiency: Optimizing Block Size in the Local Outlier Factor (LOF) Algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media