Multilabel graph-based classification for missing labels

Sumikawa, Yasunobu; Miyazaki, Tatsurou

doi:10.1007/s00799-020-00295-3

Multilabel graph-based classification for missing labels

Published: 12 October 2020

Volume 22, pages 85–104, (2021)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

Yasunobu Sumikawa¹ &
Tatsurou Miyazaki²

383 Accesses
1 Citation
Explore all metrics

Abstract

Assigning several labels to digital data is becoming easier as this can be achieved in a collaborative manner with Internet users. However, this process is still a challenge, especially in cases where several labels are assigned to each datum, as some suitable labels may be missed. The missing labels lead to inaccuracies in classification. In this study, we propose a novel graph-based multi-label classifier that exhibits stability for obtaining high-accuracy results; this is achieved even where there are missing labels in training data. The core process of our algorithm is to smoothen the label values of the training data from their top-k similar data by propagating their values and averaging them to generate values for the missing labels in the training data. In experimental evaluations, we used multi-labeled document and image datasets to evaluate classifiers, and then measured micro-averaged F-scores for eight classifiers. Even though we incrementally removed correct labels from the two datasets, the proposed algorithm tended to maintain the F-scores, whereas other classifiers decreased the scores. In addition, we evaluated the algorithm using Wikipedia, which comprises a real dataset that includes missing labels, in order to determine how well the algorithm predicted the correct labels and how useful it was for manual annotations, as initial decisions. We have confirmed that LPAC is useful for not only automatic annotation, but also the facilitation of decision making in the initial manual category assignment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Microsoft COCO: Common Objects in Context

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Self-supervised Learning: A Succinct Review

Article 20 January 2023

Notes

References

Barforoush, A., Shirazi, H., Emami, H.: A new classification framework to evaluate the entity profiling on the web: Past, present and future. ACM Comput. Surv. 50(3), 39:1–39:39 (2017)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 31:1–31:50 (2016)
Article Google Scholar
Cardoso-Cachopo, A., Oliveira, A.L.: Semi-supervised single-label text categorization using centroid-based classifiers. In: SAC’07, pp. 844–851. ACM, New York, NY, USA (2007)
Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: NIPS’02, pp. 601–608. MIT Press, Cambridge, MA, USA (2002)
Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 76(2), 211–225 (2009)
Article Google Scholar
Cong, G., Lee, W.S., Wu, H., Liu, B.: Semi-supervised Text Classification Using Partitioned EM. Database Systems for Advanced Applications, pp. 482–493. Springer, Berlin (2004)
Book Google Scholar
Ghani, R.: Combining labeled and unlabeled data for multiclass text categorization. In: ICML’02, pp. 187–194. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002)
Kang, F., Jin, R., Sukthankar, R.: Correlated label propagation with application to multi-label learning. In: CVPR’06, pp. 1719–1726. New York, NY, USA (2006)
Kong, X., Ng, M.K., Zhou, Z.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2013)
Article Google Scholar
Košmerlj, A., Belyaeva, E., Leban, G., Grobelnik, M., Fortuna, B.: Towards a complete event type taxonomy. In: WWW’15 Companion, pp. 899–902. ACM, New York, NY, USA (2015)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML’14, pp. II–1188–II–1196. JMLR.org (2014)
Lo, H., Lin, S., Wang, H.: Generalized k-labelsets ensemble for multi-label and cost-sensitive classification. IEEE Trans. Knowl. Data Eng. 26(7), 1679–1691 (2014)
Article Google Scholar
Menc’ia, E.L., Park, S., Fürnkranz, J.: Efficient voting prediction for pairwise multilabel classification. Neurocomputing 73(7–9), 1164–1176 (2010)
Article Google Scholar
Mikolov, T., Kai, C., Suchanek Greg, C., Dean, J.: Linguistic regularities in continuous space word representations. In: NAACL-HLT’13, pp. 746–751 (2013)
Mikolov, T., Sutskever, I., Chen, K., S. Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS’13, pp. 3111–3119 (2013)
Mikolov, T., Yih, W.t., Zweig, G.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)
Article Google Scholar
Pardalos, P.M., Vavasis, S.A.: Quadratic programming with one negative eigenvalue is NP-hard. J. Glob. Optim. 1, 15–22 (1991)
Article MathSciNet Google Scholar
Pise, N.N., Kulkarni, P.: A survey of semi-supervised learning methods. In: 2008 International Conference on Computational Intelligence and Security, CISIS’08, vol. 2, pp. 30–34 (2008)
Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. 41(2), 12:1–312:1 (2009)
Article Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Article MathSciNet Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Seyedi, S.A., Lotfi, A., Moradi, P., Qader, N.N.: Dynamic graph-based label propagation for density peaks clustering. Expert Syst. Appl. 115, 314–328 (2019)
Article Google Scholar
Sumikawa, Y., Jatowt, A.: Classifying short descriptions of past events. In: ECIR’18, pp. 729–736 (2018)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data, pp. 667–685 (2010)
Wang, B., Tsotsos, J.: Dynamic label propagation for semi-supervised multi-class multi-label classification. Pattern Recognit. 52, 75–84 (2016)
Article Google Scholar
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML’06, pp. 985–992. ACM, New York, NY, USA (2006)
Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)
Article Google Scholar
Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: NIPS’04, pp. 321–328. MIT Press (2004)
Zhu, X.: Semi-supervised learning with graphs. Ph.D. thesis, Pittsburgh, PA, USA (2005)
Zhu, X.: Semi-supervised learning literature survey. Comput. Sci. 2, 4 (2008)
Google Scholar
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Intell. Mach. Learn. 3, 1–130 (2009)
MATH Google Scholar
Zoidi, O., Fotiadou, E., Nikolaidis, N., Pitas, I.: Graph-based label propagation in digital media: a review. ACM Comput. Surv. 47(3), 48:1–48:35 (2015)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by MEXT Grant-in-Aid (#19K20631).

Author information

Authors and Affiliations

University Education Center, Tokyo Metropolitan University, Hachioji, Tokyo, Japan
Yasunobu Sumikawa
Department of Information Sciences, Tokyo University of Science, Noda, Chiba, Japan
Tatsurou Miyazaki

Authors

Yasunobu Sumikawa
View author publications
You can also search for this author in PubMed Google Scholar
Tatsurou Miyazaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yasunobu Sumikawa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sumikawa, Y., Miyazaki, T. Multilabel graph-based classification for missing labels. Int J Digit Libr 22, 85–104 (2021). https://doi.org/10.1007/s00799-020-00295-3

Download citation

Received: 05 March 2019
Revised: 17 August 2020
Accepted: 23 September 2020
Published: 12 October 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00799-020-00295-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilabel graph-based classification for missing labels

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Learning from imbalanced data: open challenges and future directions

Self-supervised Learning: A Succinct Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multilabel graph-based classification for missing labels

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

Learning from imbalanced data: open challenges and future directions

Self-supervised Learning: A Succinct Review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation