Skip to main content
Log in

Multilabel graph-based classification for missing labels

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Assigning several labels to digital data is becoming easier as this can be achieved in a collaborative manner with Internet users. However, this process is still a challenge, especially in cases where several labels are assigned to each datum, as some suitable labels may be missed. The missing labels lead to inaccuracies in classification. In this study, we propose a novel graph-based multi-label classifier that exhibits stability for obtaining high-accuracy results; this is achieved even where there are missing labels in training data. The core process of our algorithm is to smoothen the label values of the training data from their top-k similar data by propagating their values and averaging them to generate values for the missing labels in the training data. In experimental evaluations, we used multi-labeled document and image datasets to evaluate classifiers, and then measured micro-averaged F-scores for eight classifiers. Even though we incrementally removed correct labels from the two datasets, the proposed algorithm tended to maintain the F-scores, whereas other classifiers decreased the scores. In addition, we evaluated the algorithm using Wikipedia, which comprises a real dataset that includes missing labels, in order to determine how well the algorithm predicted the correct labels and how useful it was for manual annotations, as initial decisions. We have confirmed that LPAC is useful for not only automatic annotation, but also the facilitation of decision making in the initial manual category assignment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/Category:Natural_disasters_by_country.

  2. https://catalog.data.gov/dataset/siam-2007-text-mining-competition-dataset.

  3. https://github.com/AliAbbasi/Multilabel-Image-Classification-with-Softmax.

  4. https://en.wikipedia.org/wiki/Category:Natural_disasters_by_country.

  5. https://en.wikipedia.org/wiki/Category:Avalanches_by_country.

  6. https://en.wikipedia.org/wiki/Category:Floods_by_country.

  7. https://en.wikipedia.org/wiki/Category:Tornadoes_by_country.

  8. https://en.wikipedia.org/wiki/Category:Earthquakes_by_country.

  9. https://en.wikipedia.org/wiki/Category:Landslides_by_country.

  10. https://radimrehurek.com/gensim/models/ldamodel.html.

  11. https://radimrehurek.com/gensim/models/doc2vec.html.

  12. http://cvxopt.org/userguide/coneprog.html#quadratic-programming.

  13. http://www.seas.ucla.edu/~vandenbe/publications/mlbook.pdf.

References

  1. Barforoush, A., Shirazi, H., Emami, H.: A new classification framework to evaluate the entity profiling on the web: Past, present and future. ACM Comput. Surv. 50(3), 39:1–39:39 (2017)

    Article  Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 31:1–31:50 (2016)

    Article  Google Scholar 

  4. Cardoso-Cachopo, A., Oliveira, A.L.: Semi-supervised single-label text categorization using centroid-based classifiers. In: SAC’07, pp. 844–851. ACM, New York, NY, USA (2007)

  5. Chapelle, O., Weston, J., Schölkopf, B.: Cluster kernels for semi-supervised learning. In: NIPS’02, pp. 601–608. MIT Press, Cambridge, MA, USA (2002)

  6. Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 76(2), 211–225 (2009)

    Article  Google Scholar 

  7. Cong, G., Lee, W.S., Wu, H., Liu, B.: Semi-supervised Text Classification Using Partitioned EM. Database Systems for Advanced Applications, pp. 482–493. Springer, Berlin (2004)

    Book  Google Scholar 

  8. Ghani, R.: Combining labeled and unlabeled data for multiclass text categorization. In: ICML’02, pp. 187–194. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002)

  9. Kang, F., Jin, R., Sukthankar, R.: Correlated label propagation with application to multi-label learning. In: CVPR’06, pp. 1719–1726. New York, NY, USA (2006)

  10. Kong, X., Ng, M.K., Zhou, Z.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2013)

    Article  Google Scholar 

  11. Košmerlj, A., Belyaeva, E., Leban, G., Grobelnik, M., Fortuna, B.: Towards a complete event type taxonomy. In: WWW’15 Companion, pp. 899–902. ACM, New York, NY, USA (2015)

  12. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: ICML’14, pp. II–1188–II–1196. JMLR.org (2014)

  13. Lo, H., Lin, S., Wang, H.: Generalized k-labelsets ensemble for multi-label and cost-sensitive classification. IEEE Trans. Knowl. Data Eng. 26(7), 1679–1691 (2014)

    Article  Google Scholar 

  14. Menc’ia, E.L., Park, S., Fürnkranz, J.: Efficient voting prediction for pairwise multilabel classification. Neurocomputing 73(7–9), 1164–1176 (2010)

    Article  Google Scholar 

  15. Mikolov, T., Kai, C., Suchanek Greg, C., Dean, J.: Linguistic regularities in continuous space word representations. In: NAACL-HLT’13, pp. 746–751 (2013)

  16. Mikolov, T., Sutskever, I., Chen, K., S. Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS’13, pp. 3111–3119 (2013)

  17. Mikolov, T., Yih, W.t., Zweig, G.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)

  18. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)

    Article  Google Scholar 

  19. Pardalos, P.M., Vavasis, S.A.: Quadratic programming with one negative eigenvalue is NP-hard. J. Glob. Optim. 1, 15–22 (1991)

    Article  MathSciNet  Google Scholar 

  20. Pise, N.N., Kulkarni, P.: A survey of semi-supervised learning methods. In: 2008 International Conference on Computational Intelligence and Security, CISIS’08, vol. 2, pp. 30–34 (2008)

  21. Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. 41(2), 12:1–312:1 (2009)

    Article  Google Scholar 

  22. Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)

    Article  MathSciNet  Google Scholar 

  23. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  24. Seyedi, S.A., Lotfi, A., Moradi, P., Qader, N.N.: Dynamic graph-based label propagation for density peaks clustering. Expert Syst. Appl. 115, 314–328 (2019)

    Article  Google Scholar 

  25. Sumikawa, Y., Jatowt, A.: Classifying short descriptions of past events. In: ECIR’18, pp. 729–736 (2018)

  26. Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data, pp. 667–685 (2010)

  27. Wang, B., Tsotsos, J.: Dynamic label propagation for semi-supervised multi-class multi-label classification. Pattern Recognit. 52, 75–84 (2016)

    Article  Google Scholar 

  28. Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML’06, pp. 985–992. ACM, New York, NY, USA (2006)

  29. Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)

    Article  Google Scholar 

  30. Zhang, M.L., Zhou, Z.H.: Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)

    Article  Google Scholar 

  31. Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., Scholkopf, B.: Learning with local and global consistency. In: NIPS’04, pp. 321–328. MIT Press (2004)

  32. Zhu, X.: Semi-supervised learning with graphs. Ph.D. thesis, Pittsburgh, PA, USA (2005)

  33. Zhu, X.: Semi-supervised learning literature survey. Comput. Sci. 2, 4 (2008)

    Google Scholar 

  34. Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Intell. Mach. Learn. 3, 1–130 (2009)

    MATH  Google Scholar 

  35. Zoidi, O., Fotiadou, E., Nikolaidis, N., Pitas, I.: Graph-based label propagation in digital media: a review. ACM Comput. Surv. 47(3), 48:1–48:35 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by MEXT Grant-in-Aid (#19K20631).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasunobu Sumikawa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sumikawa, Y., Miyazaki, T. Multilabel graph-based classification for missing labels. Int J Digit Libr 22, 85–104 (2021). https://doi.org/10.1007/s00799-020-00295-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-020-00295-3

Keywords

Navigation