Abstract
Clustering in high dimension spaces is a difficult task; the usual distance metrics may no longer be appropriate under the curse of dimensionality. Indeed, the choice of the metric is crucial, and it is highly dependent on the dataset characteristics. However a single metric could be used to correctly perform clustering on multiple datasets of different domains. We propose to do so, providing a framework for learning a transferable metric. We show that we can learn a metric on a labelled dataset, then apply it to cluster a different dataset, using an embedding space that characterises a desired clustering in the generic sense. We learn and test such metrics on several datasets of variable complexity (synthetic, MNIST, SVHN, omniglot) and achieve results competitive with the state-of-the-art while using only a small number of labelled training datasets and shallow networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
As a reminder, Let T and U be two topological spaces. A function \(f:T\mapsto U\) is continuous in the open set definition if for every \(t\in T\) and every open set u containing f(t), there exists a neighbourhood v of t such that \(f(v)\subset u\).
References
Arjovsky, M., et al.: Wasserstein generative adversarial networks. In: ICML, pp. 214–223 (2017)
Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML, p. 11 (2004)
de Boer, P.T., Kroese, D.P., et al.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)
Cohen, G., et al.: EMNIST: Extending MNIST to handwritten letters. In: IJCNN (2017)
Ganin, Y., et al.: Domain-adversarial training of neural networks. JMLR 17, 2096–2130 (2016)
Han, K., et al.: Learning to discover novel visual categories via deep transfer clustering (2019)
Han, K., Rebuffi, S.A., et al.: AutoNovel: automatically discovering and learning novel visual categories. PAMI 1 (2021)
Hsu, et al.: Learning to cluster in order to transfer across domains and tasks (2017)
Kipf, T.N., Welling, M.: Variational graph auto-encoders (2016)
Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)
LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database (2010)
Netzer, Y., et al.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of the 17th International Conference on Machine Learning, pp. 663–670 (2000)
Saito, K., Ushiku, Y., Harada, T.: Asymmetric tri-training for unsupervised domain adaptation. ICML, pp. 2988–2997 (2017)
Sener, O., Song, H.O., et al.: Learning transfer able representations for unsupervised domain adaptation. In: NIPS, pp. 2110–2118 (2016)
Strehl, A., Ghosh, J.: Cluster ensembles-a knowledge reuse framework for combining multiple partitions. JMLR 3(Dec), 583–617 (2002)
Wang, X., Qian, B., Davidson, I.: On constrained spectral clustering and its applications. Data Min. Knowl. Discov. 28, 1–30 (2014)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms (2017)
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: ICML, pp. 478–487 (2016)
Yang, Y., Xu, D., et al.: Image clustering using local discriminant models and global integration. IEEE Trans. Image Process. 19(10), 2761–2773 (2010)
Acknowledgements
We gratefully acknowledge Orianne Debeaupuis for making the figure. We also acknowledge computing support from NVIDIA. This work was supported by funds from the French Program “Investissements d’Avenir”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Alami Chehboune, M., Kaddah, R., Read, J. (2023). Transferable Deep Metric Learning for Clustering. In: Crémilleux, B., Hess, S., Nijssen, S. (eds) Advances in Intelligent Data Analysis XXI. IDA 2023. Lecture Notes in Computer Science, vol 13876. Springer, Cham. https://doi.org/10.1007/978-3-031-30047-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-30047-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30046-2
Online ISBN: 978-3-031-30047-9
eBook Packages: Computer ScienceComputer Science (R0)