RPSC: Robust Pseudo-Labeling for Semantic Clustering

Authors

  • Sihang Liu South China University of Technology
  • Wenming Cao Chongqing Jiaotong University
  • Ruigang Fu National University of Defense Technology
  • Kaixiang Yang South China University of Technology
  • Zhiwen Yu South China University of Technology Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v38i12.29309

Keywords:

ML: Clustering, CV: Learning & Optimization for CV, CV: Representation Learning for Vision, ML: Applications, ML: Deep Learning Algorithms, ML: Optimization, ML: Unsupervised & Self-Supervised Learning

Abstract

Clustering methods achieve performance improvement by jointly learning representation and cluster assignment. However, they do not consider the confidence of pseudo-labels which are not optimal as supervised information, resulting into error accumulation. To address this issue, we propose a Robust Pseudo-labeling for Semantic Clustering (RPSC) approach, which includes two stages. In the first stage (RPSC-Self), we design a semantic pseudo-labeling scheme by using the consistency of samples, i.e., samples with same semantics should be close to each other in the embedding space. To exploit robust semantic pseudo-labels for self-supervised learning, we propose a soft contrastive loss (SCL) which encourage the model to believe high-confidence sematic pseudo-labels and be less driven by low-confidence pseudo-labels. In the second stage (RPSC-Semi), we first determine the semantic pseudo-label of a sample based on the distance between itself and cluster centers, followed by screening out reliable semantic pseudo-label by exploiting the consistency. These reliable pseudo-labels are used as supervised information in the pseudo-semi-supervised learning algorithm to further improve the performance. Experimental results show that RPSC outperforms 18 competitive clustering algorithms significantly on six challenging image benchmarks. In particular, RPSC achieves an accuracy of 0.688 on ImageNet-Dogs, which is an up to 24% improvement, compared with the second-best method. Meanwhile, we conduct ablation studies to investigate effects of different augmented strategies on RPSC as well as contributions of terms in SCL to clustering performance. Besides, experimental results indicate that SCL can be easily integrated into existing clustering methods and bring performance improvement.

Published

2024-03-24

How to Cite

Liu, S., Cao, W., Fu, R., Yang, K., & Yu, Z. (2024). RPSC: Robust Pseudo-Labeling for Semantic Clustering. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 14008-14016. https://doi.org/10.1609/aaai.v38i12.29309

Issue

Section

AAAI Technical Track on Machine Learning III