Outlying property detection with numerical attributes

Angiulli, Fabrizio; Fassetti, Fabio; Manco, Giuseppe; Palopoli, Luigi

doi:10.1007/s10618-016-0458-x

Outlying property detection with numerical attributes

Published: 29 March 2016

Volume 31, pages 134–163, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Fabrizio Angiulli¹,
Fabio Fassetti¹,
Giuseppe Manco ORCID: orcid.org/0000-0001-9672-3833² &
…
Luigi Palopoli¹

772 Accesses
20 Citations
1 Altmetric
Explore all metrics

Abstract

The outlying property detection problem (OPDP) is the problem of discovering the properties distinguishing a given object, known in advance to be an outlier in a database, from the other database objects. This problem has been recently analyzed focusing on categorical attributes only. However, numerical attributes are very relevant and widely used in databases. Therefore, in this paper, we analyze the OPDP within a context where also numerical attributes are taken into account, which represents a relevant case left open in the literature. As major contributions, we present an efficient parameter-free algorithm to compute the measure of object exceptionality we introduce, and propose a unified framework for mining exceptional properties in the presence of both categorical and numerical attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Singular Outliers: Finding Common Observations with an Uncommon Feature

A density estimation approach for detecting and explaining exceptional values in categorical data

Article Open access 02 April 2022

Fabrizio Angiulli, Fabio Fassetti, … Cristina Serrao

The Concept of α-Outliers in Structured Data Situations

Notes

For the sake of simplicity and without loss of generality, we are assuming that an arbitrary ordering of the attributes in A has been fixed.
We point out that \(k_\theta \) has a twofold function: it allows the analyst to control the complexity of the mined patterns and it speeds up the algorithm execution. However, by setting \(k_\theta \) to m the algorithm is able to detect explanations of any length, while pruning the search space and avoiding overfitting by means of the threshold support.
In practice, if a tuple is detected as an outlier in a given iteration, it gets a positive score. Scores are then summarized in the combine function, and tuples are sorted according to the scores.
Experiments were performed on an Intel Core i7 2.3 GHz based computer by using the Java programming language.

References

Aggarwal C (2013) Outlier analysis. Springer, New York
Book MATH Google Scholar
Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: Proceeding of the ACM SIGMOD conference on managment of data (SIGMOD’01), pp 37–46
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the international conference on very large data bases (VLDB’94), pp 487–499
Angiulli F, Fassetti F (2009) Dolphin: an efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans Knowl Discov Data 3(1):Article 4
Angiulli F, Fassetti F, Palopoli L (2009) Detecting outlying properties of exceptional objects. ACM Trans Database Syst 34(1):1–62
Article Google Scholar
Arning A, Aggarwal C, Raghavan P (1996) A linear method for deviation detection in large databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD’96), pp 164–169
Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, Chichester
MATH Google Scholar
Bay S, Pazzani M (1999) Detecting change in categorical data: mining constrast sets. In: Proceedings of the ACM conference on knowledge discovery in data (KDD’99), pp 302–306
Breunig MM, Kriegel H, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD conference on managment of data (SIGMOD’00), pp 93–104
Caroni C (2000) Outlier detection by robust principal component analysis. Commun Stat Simul Comput 29:129–151
Article MATH Google Scholar
Chawla NV, Japkowicz N, Kotcz A (2004) Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6
Article Google Scholar
Costa G, Fassetti F, Guarascio M, Manco G, Ortale R (2010) Mining models of exceptional objects through rule learning. In: Proceedings of the ACM symposium on applied computing (SAC’10), pp 1078–1082
Dang XH, Assent I, Ng RT, Zimek A, Schubert E (2014) Discriminative features for identifying and interpreting outliers. In: Proceedings of the IEEE international conference on data engineering, (ICDE’14), pp 88–99
Dang XH, Micenkov B, Assent I, Ng RT (2013) Local outlier detection with interpretation. In: Proceedings of the joint European conference on machine learning and knowledge discovery in databases (ECML-PKDD’13). Lecture Notes in Computer Science, vol 8190. pp 304–320
De Vries, T, Chawla, S, Houle M (2010) Finding local anomalies in very high dimensional space. In: Proceedings of the IEEE international confence on data mining (ICDM’10), pp 128–137
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39
Duan L, Tang G, Pei J, Bailey J, Campbell A, Tang C (2015) Mining outlying aspects on numeric data. Data Min Knowl Dis 29(5):1116–1151
Article MathSciNet Google Scholar
Eskin E (2000) Anomaly detection over noisy data using learned probability distributions. In: Proceedings of the international conference on machine learning (ICML’00), pp 255–262
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24:381–396
Article Google Scholar
Fox J (1990) Describing univariate distributions. In: Fox J, Long JS (eds) Modern methods of data analysis. Sage Publications, Newbury Park, pp 58–125
Google Scholar
Ghoting A, Parthasarathy S, Otey ME (2015) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Dis 16:349–364
Article MathSciNet Google Scholar
Greco A, Perri S (2014) Identification of high shears and compressive discontinuities in the inner heliosphere. Astrophys J 784(2):163
Article Google Scholar
Jin W, Tung AKH, Han J (2001) Mining top-n local outliers in large databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD’01), pp 293–298
Jones MC, Henderson DA (2009) Maximum likelihood kernel density estimation: on the potential of convolution sieves. Comput Stat Data Anal 53:3726–3733
Article MathSciNet MATH Google Scholar
Knorr E, Ng R (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the international conference on very large databases (VLDB’98), pp 392–403
Knorr E, Ng R (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of the international conference on very large databases (VLDB’99), pp 211–222
Kriegel H-P, Schubert M, Zimek A (2008) Angle-based outlier detection in high-dimensional data. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD’08), pp 444–452
Kriegel H-P, Kröger P, Schubert E, Zimek A (2009) Loop: local outlier probabilities. In: Proceedings of the ACM international conference on information and knowledge management (CIKM’09), pp 1649–1652
Kriegel HP, Kroger P, Schubert E, Zimek A (2009) Outlier detection in axis-parallel subspaces of high dimensional data. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD’09), pp 831–838
Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the ACM SIGKDD conference on knowledge discovery in data (KDD’05), pp 157–166
Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://archive.ics.uci.edu/ml
Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: Proceedings of the IEEE international conference on data mining (ICDM’08), pp 413–422
Micenková B, Ng RT, Dang XH, Assent I (2013) Explaining outliers by subspace separability. In: Proceedings of the IEEE international conference on data mining (ICDM’13), pp 518–527
Nguyen H, Gopalkrishnan V, Assent I (2011) An unbiased distance-based outlier detection approach for high dimensional data. In: Proceedings of the international conference on database systems for advanced applications (DASFAA), pp 138–152
Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: fast outlier detection using the local correlation integral. In: Proceedings of the IEEE international conference on data enginnering (ICDE’03), pp 315–326
Rousseeuw P, Leroy A (2003) Robust regression and outlier detection. Wiley, New York
MATH Google Scholar
Salgado-Ugarte IH, Pérez-Hernández MA (2003) Exploring the use of variable bandwidth kernel density estimators. Stata J 3(2):133–147
Google Scholar
Schölkopf B, Burges C, Vapnik V (1995) Extracting support data for a given task. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 252–257
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Book MATH Google Scholar
Vinh NX, Chan J, Bailey J, Leckie C, Ramamohanarao K, Pei J (2015) Scalable outlying-inlying aspects discovery via feature ranking. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data (PAKDD’15), pp 422–434
Xiong L, Chen X, Schneider J (2011) Direct robust matrix factorization for anomaly detection. In: Proceedings of the IEEE international confence on data mining (ICDM’11), pp 844 – 853
Yang X, Latecki LJ, Pokrajac D (2009) Outlier detection with globally optimal exemplar-based GMM. In: Proceedings of the SIAM conference on data mining (SDM’09), pp 145–154

Download references

Acknowledgments

This research has been partially supported by the PRIN Project 20122F87B2 “Compositional Approaches for the Characterization and Mining of Omics Data” co-financed by the Italian Ministry of Education, University and Research.

Author information

Authors and Affiliations

DIMES Department, University of Calabria, 87036, Rende, Italy
Fabrizio Angiulli, Fabio Fassetti & Luigi Palopoli
Institute of High Performance Computing and Networks (ICAR-CNR), 87036, Rende, Italy
Giuseppe Manco

Authors

Fabrizio Angiulli
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Fassetti
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe Manco
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Palopoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giuseppe Manco.

Additional information

Responsible editor: Charu Aggarwal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Angiulli, F., Fassetti, F., Manco, G. et al. Outlying property detection with numerical attributes. Data Min Knowl Disc 31, 134–163 (2017). https://doi.org/10.1007/s10618-016-0458-x

Download citation

Received: 09 October 2015
Accepted: 04 March 2016
Published: 29 March 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10618-016-0458-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Outlying property detection with numerical attributes

Abstract

Access this article

Similar content being viewed by others

Singular Outliers: Finding Common Observations with an Uncommon Feature

A density estimation approach for detecting and explaining exceptional values in categorical data

The Concept of α-Outliers in Structured Data Situations

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Outlying property detection with numerical attributes

Abstract

Access this article

Similar content being viewed by others

Singular Outliers: Finding Common Observations with an Uncommon Feature

A density estimation approach for detecting and explaining exceptional values in categorical data

The Concept of α-Outliers in Structured Data Situations

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation