Skip to main content
Log in

Survey on distance metric learning and dimensionality reduction in data mining

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Distance metric learning is a fundamental problem in data mining and knowledge discovery. Many representative data mining algorithms, such as \(k\)-nearest neighbor classifier, hierarchical clustering and spectral clustering, heavily rely on the underlying distance metric for correctly measuring relations among input data. In recent years, many studies have demonstrated, either theoretically or empirically, that learning a good distance metric can greatly improve the performance of classification, clustering and retrieval tasks. In this survey, we overview existing distance metric learning approaches according to a common framework. Specifically, depending on the available supervision information during the distance metric learning process, we categorize each distance metric learning algorithm as supervised, unsupervised or semi-supervised. We compare those different types of metric learning methods, point out their strength and limitations. Finally, we summarize open challenges in distance metric learning and propose future directions for distance metric learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. http://en.wikipedia.org/wiki/Metric_(mathematics).

  2. http://en.wikipedia.org/wiki/Mahalanobis_distance

  3. http://en.wikipedia.org/wiki/Covariance_matrix

  4. http://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix.

  5. In this paper two data vectors are considered to be similar if the Euclidean distance between them is small, two data tensors are considered to be similar if the Frobenius norm of their difference tensor is small.

  6. http://en.wikipedia.org/wiki/Hinge_loss

References

  • Bar-Hillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res 6(6):937–965

    MATH  MathSciNet  Google Scholar 

  • Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst 14:585–591

    Google Scholar 

  • Bengio Y, Paiement J-F, Vincent P, Delalleau O, Le Roux N, Ouimet M (2004) Out-of-sample extensions for LLE, isomap, MDS, eigenmaps, and spectral clustering. In: Advances in neural information processing systems, vol 16, pp 177–184

  • Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the twenty-first international conference on Machine learning. ACM, Berlin, pp 11–18

  • Censor Y (1997) Parallel optimization: theory, algorithms, and applications. Oxford University Press, New York

    MATH  Google Scholar 

  • Cox TF, Cox MAA (2000) Multidimensional scaling, 2nd edn. Chapman and Hall/CRC, Boca Raton

    Google Scholar 

  • Crammer K, Singer Y (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292

    Google Scholar 

  • Dasgupta S, Langford J (2009) A tutorial on active learning. In: International conference on machine learning

  • Davidson I, Wagstaff KL, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases, pp 115–126

  • Davis JV, Kulis B, Jain P, Sra Suvrit, Dhillon IS (2007) Information-theoretic metric learning. In: International conference on machine learning (ICML), pp 209–216

  • Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Discov 14(1):63–97

    Article  MathSciNet  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification, vol 2, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Elkan C (2011) Bilinear models of affinity. Personal note

  • Fukunaga K (1990) Introduction to statistical pattern recognition, second edition (computer science and scientific computing series), 2nd edn. Academic Press, Boston

    Google Scholar 

  • Goldberger J, Roweis S, Hinton G, Salakhutdinov R (2004) Neighborhood component analysis. In: Advances in neural information processing systems (NIPS)

  • Guo Y, Li S, Yang J, Shu T, Wu L (2003) A generalized foleysammon transform based on generalized fisher discriminant criterion and its application to face recognition. Pattern Recognit Lett 24(1–3):147–158

    Article  MATH  Google Scholar 

  • He J, Li M, Zhang HJ, Tong H, Zhang C (2006) Generalized manifold-ranking-based image retrieval. IEEE Trans Image Process 15(10):3170–3177

    Article  Google Scholar 

  • He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems (NIPS), vol 16, pp 234–241

  • Hinton GE, Roweis ST (2002) Stochastic neighbor embedding. In: Advances in neural information processing systems (NIPS), pp 833–840

  • Hoi Steven CH, Liu W, Chang S-F (2008) Semi-supervised distance metric learning for collaborative image retrieval. In: Proceedings of IEEE Computer Society conference on computer vision and pattern recognition

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall Inc., Upper Saddle River

    MATH  Google Scholar 

  • Jia Y, Nie F, Zhang C (2009) Trace ratio problem revisited. IEEE Trans Neural Netw 20(4):729–735

    Article  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37(2):183–233

    Article  MATH  Google Scholar 

  • Kocsor A, Kovács K, Szepesvári C (2004) Margin maximizing discriminant analysis. In: Proceedings of European conference on machine learning, vol 3201 of Lecture notes in computer science. Springer, Berlin, pp 227–238

  • Kulis B (2010) Metric learning. In: Tutorial at International conference on machine learning

  • Kulis Brian (2012) Metric learning: a survey. Found Trends Mach Learn 5(4):287–364

    Article  MATH  MathSciNet  Google Scholar 

  • Li Z, Cao L, Chang S, Smith JR, Huang TS (2012) Beyond mahalanobis distance: Learning second-order discriminant function for people verification. In: Prcoeedings of computer vision and pattern recognition workshops (CVPRW), 2012 IEEE computer society conference on workshops, pp 45–50

  • Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

    Article  MathSciNet  Google Scholar 

  • Mika S, Ratsch G, Weston J, Schölkopf B, Müllers KR (1999) Fisher discriminant analysis with kernels. In: Neural networks for signal processing IX, 1999. proceedings of the 1999 IEEE signal processing society workshop, pp 41–48

  • Modi JJ (1989) Parallel algorithms and matrix computation. Oxford University Press, Inc

  • Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359

    Article  Google Scholar 

  • Pele O, Werman M (2010) The quadratic-chi histogram distance family. In: Computer vision ECCV 2010, volume 6312 of lecture notes in computer science, chapt 54. Springer, Berlin, pp 749–762

  • Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  • Schölkopf B, Smola AJ (2002) Learning with kernels : support vector machines, regularization, optimization, and beyond. The MIT Press, Cambridge

    Google Scholar 

  • Schultz M, Joachims T (2004) Learning a distance metric from relative comparisons. In: Advances in neural information processing systems (NIPS), vol 16, pp 41–48

  • Shalev-Shwartz S (2007, July) Online learning: theory, algorithms, and applications. The Hebrew University of Jerusalem. Ph.D. Thesis

  • Shalev-Shwartz S, Singer Y, Ng AY (2004) Online and batch learning of pseudo-metrics. In: Proceedings of international conference on machine learning, pp 94–101

  • Shental N, Hertz T, Weinshall D, Pavel M (2002) Adjustment learning and relevant component analysis. In: Proceedings of European conference on computer vision, pp 776–790

  • Singh A, Nowak RD, Zhu X (2008) Unlabeled data: now it helps, now it doesn’t. In: Advances in neural information processing systems, pp 1513–1520

  • Sun J, Sow D, Hu J, Ebadollahi S (2010) Localized supervised metric learning on temporal physiological data. In: International conference on pattern recognition (ICPR)

  • Tenenbaum JB, Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  • Tsang IW, Cheung PM, Kwok JT (2005) Kernel relevant component analysis for distance metric learning. In: In IEEE International joint conference on neural networks (IJCNN), pp 954–959

  • Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    Book  MATH  Google Scholar 

  • Wang F, Chen S, Zhang C, Li T (2008) Semi-supervised metric learning by maximizing constraint margin. In: Proceedings of the 17th ACM conference on information and knowledge management, pp 1457–1458

  • Wang F, Sun J, Ebadollahi S (2011) Integrating distance metrics learned from multiple experts and its application in patient similarity assessment. In: SIAM data mining conference (SDM), pp 59–70

  • Wang F, Sun J, Hu J, Ebadollahi S (2011) Imet: interactive metric learning in healthcare applications. In: SIAM data mining conference (SDM), pp 944–955

  • Wang F, Zhang C (2007) Feature extraction by maximizing the average neighborhood margin. In: IEEE Computer Society conference on computer vision and pattern recognition (CVPR)

  • Wang F, Zhao B, Zhang C (2011) Unsupervised large margin discriminative projection. IEEE Trans Neural Netw 22(9):1446–1456

    Article  Google Scholar 

  • Weinberger KQ, Blitzer J, Saul LK (2005) Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems

  • Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

  • Werman M, Pele O, Kulis B (2010) Distance functions and metric learning. In: Tutorial at European conference on computer vision

  • Xing EP, Ng AY, Jordan MI, Russell S (2002) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems (NIPS), vol 15, pp 505–512

  • Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Technical report, Department of Computer Science and Engineering, Michigan State University

  • Yang L, Jin R, Sukthankar R (2007) Bayesian active distance metric learning. In: Proceedings of uncertainties in artificial intelligence, AUAI Press, Corvallis, pp 442–449

  • Yang X, Fu H, Zha H, Barlow J (2006) Semi-supervised nonlinear dimensionality reduction. In: 23rd International conference on machine learning, pp 1065–1072

  • Zhang Y, Yeung D-Y (2010) Transfer metric learning by learning task relationships. In: Proceedings of the 18th ACM SIGKDD conference on knowledge discovery and data mining, pp 1199–1208

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Wang.

Additional information

Responsible editor: Ian Davidson.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, F., Sun, J. Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Disc 29, 534–564 (2015). https://doi.org/10.1007/s10618-014-0356-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-014-0356-z

Keywords

Navigation