Weighted t-Distributed Stochastic Neighbor Embedding for Projection-Based Clustering

Nápoles, Gonzalo; Concepción, Leonardo; Özgöde Yigin, Büşra; Saygili, Görkem; Vanhoof, Koen; Bello, Rafael

doi:10.1007/978-3-031-49552-6_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14335))

Included in the following conference series:

International Workshop on Artificial Intelligence and Pattern Recognition

136 Accesses

Abstract

This paper presents a projection-based clustering method for visualizing high-dimensional data points in lower-dimensional spaces while preserving the data’s structural properties. The proposed method modifies the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm by adding a weight function that adjusts the dissimilarity between high-dimensional data points to obtain more realistic lower-dimensional representations. In our algorithm, the centroids obtained with a prototype-based clustering algorithm attract high-dimensional data points allocated to their respective clusters, while repelling those points assigned to other clusters. The simulations using real-world datasets show that the Weighted t-SNE produces better projections than similar algorithms without the need for any previous dimensionality reduction step.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdelaal, T., et al.: A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 20(1), 1–19 (2019)
Article Google Scholar
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (1997)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Article Google Scholar
Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer, New York (2005). https://doi.org/10.1007/0-387-28981-X
Book Google Scholar
Cakir, B., Prete, M., Huang, N., Van Dongen, S., Pir, P., Kiselev, V.Y.: Comparison of visualization tools for single-cell RNAseq data. NAR Genomics Bioinform. 2(3), lqaa052 (2020)
Google Scholar
Cao, J., et al.: The single-cell transcriptional landscape of mammalian organogenesis. Nature 566(7745), 496–502 (2019)
Article Google Scholar
Hinton, G.E., Roweis, S.: Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems, vol. 15 (2002)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)
Article Google Scholar
Kiselev, V.Y., Andrews, T.S., Hemberg, M.: Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20(5), 273–282 (2019)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Van der Maaten, L.: Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 5, pp. 384–391. PMLR, 16–18 April 2009
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Google Scholar
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Ozgode Yigin, B., Saygili, G.: Confidence estimation for t-SNE embeddings using random forest. Int. J. Mach. Learn. Cybern. 13(12), 3981–3992 (2022). https://doi.org/10.1007/s13042-022-01635-2
Article Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Sammon, J.W.: A nonlinear mapping for data structure analysis. IEEE Trans. Comput. 100(5), 401–409 (1969)
Article Google Scholar
Tenenbaum, J.: Mapping a manifold of perceptual observations. In: Advances in Neural Information Processing Systems, vol. 10 (1997)
Google Scholar
Van Der Maaten, L., Postma, E., Van den Herik, J., et al.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10(66–71), 13 (2009)
Google Scholar
Venna, J., Kaski, S.: Neighborhood preservation in nonlinear projection methods: an experimental study. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, pp. 485–491. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44668-0_68
Chapter Google Scholar
Weinberger, K., Packer, B., Saul, L.: Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In: International Workshop on Artificial Intelligence and Statistics, pp. 381–388. PMLR (2005)
Google Scholar
Xiang, R., Wang, W., Yang, L., Wang, S., Xu, C., Chen, X.: A comparison for dimensionality reduction methods of single-cell RNA-seq data. Front. Genet. 12, 646936 (2021)
Article Google Scholar
Zhou, Y., Sharpee, T.O.: Using global t-SNE to preserve intercluster data structure. Neural Comput. 34(8), 1637–1651 (2022)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands
Gonzalo Nápoles, Büşra Özgöde Yigin & Görkem Saygili
Department of Computer Science, Universidad Central de Las Villas, Santa Clara, Cuba
Leonardo Concepción & Rafael Bello
Business Intelligence Group, Hasselt University, Hasselt, Belgium
Leonardo Concepción & Koen Vanhoof

Authors

Gonzalo Nápoles
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Concepción
View author publications
You can also search for this author in PubMed Google Scholar
Büşra Özgöde Yigin
View author publications
You can also search for this author in PubMed Google Scholar
Görkem Saygili
View author publications
You can also search for this author in PubMed Google Scholar
Koen Vanhoof
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Bello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonardo Concepción .

Editor information

Editors and Affiliations

Universidad de las Ciencias Informáticas, Havana, Cuba
Yanio Hernández Heredia
Universidad de las Ciencias Informáticas, Havana, Cuba
Vladimir Milián Núñez
Universidad de las Ciencias Informáticas, Havana, Cuba
José Ruiz Shulcloper

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nápoles, G., Concepción, L., Özgöde Yigin, B., Saygili, G., Vanhoof, K., Bello, R. (2024). Weighted t-Distributed Stochastic Neighbor Embedding for Projection-Based Clustering. In: Hernández Heredia, Y., Milián Núñez, V., Ruiz Shulcloper, J. (eds) Progress in Artificial Intelligence and Pattern Recognition. IWAIPR 2023. Lecture Notes in Computer Science, vol 14335. Springer, Cham. https://doi.org/10.1007/978-3-031-49552-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-49552-6_12
Published: 20 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49551-9
Online ISBN: 978-3-031-49552-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Weighted t-Distributed Stochastic Neighbor Embedding for Projection-Based Clustering