Skip to main content
Log in

Multilabel classification using crowdsourcing under budget constraints

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Multilabel classification has excelled in several distinct fields during the past few decades but still has significant limitations. One of the critical concerns is the lack or insufficient availability of label instances, and data labelling also needs time and budget, which is a challenge. Crowdsourcing overcomes the problem of label availability, yet, it has drawbacks such as label quality and budget limitations. The paper introduced the multilabel reverse auction framework to address the lack of crowd worker's issue. Each crowd worker must provide cost and confidence for each task for a specific domain. Furthermore, two methods for systematic budget selection are presented to address the insufficient domain coverage within the budget limitation: Greedy bid selection and Multi cover bid selection. Both approaches choose the most inexpensive crowd workers while considering worker expertise and domain coverage. Crowd version binary relevance and multilabel k-nearest neighbours are also introduced to support label aggregation and reduce low-quality workers' impact while considering the domain. An experimental study shows the effectiveness of our approach on seven multilabel datasets using diverse crowds. It delivers more than 16% improvement compared to the random selection with a majority voting baseline technique. The proposed method is compared against five benchmark algorithms and provides promising results when minimal availability of data and workers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in an extensive experimental comparison of methods for multilabel learning.

References

  1. Howe J et al (2006) The rise of crowdsourcing. Wired magazine 14:1–4

    Google Scholar 

  2. LaToza TD, van der Hoek A (2016) Crowdsourcing in software engineering: models, motivations, and challenges. IEEE Softw 33:74–80. https://doi.org/10.1109/MS.2016.12

    Article  Google Scholar 

  3. Lease M, Yilmaz E (2012) Crowdsourcing for information retrieval. ACM SIGIR Forum 45:66–75. https://doi.org/10.1145/2093346.2093356

    Article  Google Scholar 

  4. Muller CL, Chapman L, Johnston S, Kidd C, Illingworth S, Foody G, Overeem A, Leigh RR (2015) Crowdsourcing for climate and atmospheric sciences: current status and future potential. Int J Climatol 35:3185–3203. https://doi.org/10.1002/joc.4210

    Article  Google Scholar 

  5. Xu Z, Liu Y, Yen NY, Mei L, Luo X, Wei X, Hu C (2020) Crowdsourcing based description of urban emergency events using social media big data. IEEE Trans Cloud Comput 8:387–397. https://doi.org/10.1109/TCC.2016.2517638

    Article  Google Scholar 

  6. Mohammadzadeh H, Gharehchopogh FS (2021) A multi-agent system based for solving high-dimensional optimization problems: a case study on email spam detection. Int J Commun Syst. https://doi.org/10.1002/dac.4670

    Article  Google Scholar 

  7. Vuurens J, de Vries AP, Eickhoff C How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy

  8. Zhong J, Tang K, Zhou Z-H Active Learning from Crowds with Unsure Option

  9. Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multilabel document classification. Mach Learn 88:157–208. https://doi.org/10.1007/s10994-011-5272-5

    Article  MathSciNet  Google Scholar 

  10. Gharehchopogh FS, Namazi M, Ebrahimi L, Abdollahzadeh B (2023) Advances in sparrow search algorithm: a comprehensive survey. Arch Comput Methods Eng 30:427–455. https://doi.org/10.1007/s11831-022-09804-w

    Article  Google Scholar 

  11. Gharehchopogh FS, Ucan A, Ibrikci T, Arasteh B, Isik G (2023) Slime mould algorithm: a comprehensive survey of its variants and applications. Arch Comput Methods Eng 30:2683–2723. https://doi.org/10.1007/s11831-023-09883-3

    Article  Google Scholar 

  12. Shen Y, Zhang C, Soleimanian Gharehchopogh F, Mirjalili S (2023) An improved whale optimization algorithm based on multi-population evolution for global optimization and engineering design problems. Expert Syst Appl 215:119269. https://doi.org/10.1016/j.eswa.2022.119269

    Article  Google Scholar 

  13. Suyal H, Singh A (2021) Improving multilabel classification in prototype selection scenario. Comput Intell Healthcare Inf 103–119

  14. Rabby G, Berka P (2022) Multi-class classification of COVID-19 documents using machine learning algorithms. J Intell Inf Syst. https://doi.org/10.1007/s10844-022-00768-8

    Article  Google Scholar 

  15. Lo H-Y, Wang J-C, Wang H-M, Lin S-D (2011) Cost-sensitive multilabel learning for audio tag annotation and retrieval. IEEE Trans Multimedia 13:518–529. https://doi.org/10.1109/TMM.2011.2129498

    Article  Google Scholar 

  16. Gharehchopogh FS (2023) An improved Harris Hawks optimization algorithm with multi-strategy for community detection in social network. J Bionic Eng 20:1175–1197. https://doi.org/10.1007/s42235-022-00303-z

    Article  Google Scholar 

  17. Tsoumakas G, Katakis I (2007) Multi-label classification. Int J Data Warehouse Min 3:1–13. https://doi.org/10.4018/jdwm.2007070101

    Article  Google Scholar 

  18. Lughofer E (2022) Evolving multilabel fuzzy classifier. Inf Sci 597:1–23. https://doi.org/10.1016/j.ins.2022.03.045

    Article  MathSciNet  Google Scholar 

  19. Mishra NK, Singh PK (2022) Linear ordering problem based classifier chain using genetic algorithm for multilabel classification. Appl Soft Comput 117:108395

    Article  Google Scholar 

  20. Loza Mencía E, Park S-H, Fürnkranz J (2010) Efficient voting prediction for pairwise multilabel classification. Neurocomputing 73:1164–1176. https://doi.org/10.1016/j.neucom.2009.11.024

    Article  Google Scholar 

  21. Trohidis K, Tsoumakas G, Kalliris G, Vlahavas I (2011) Multilabel classification of music by emotion. EURASIP J Audio Speech Music Process 2011:4. https://doi.org/10.1186/1687-4722-2011-426793

    Article  Google Scholar 

  22. Yap XH, Raymer M (2021) Multilabel classification and label dependence in in silico toxicity prediction. Toxicol Vitro 74:105157. https://doi.org/10.1016/j.tiv.2021.105157

    Article  Google Scholar 

  23. Huang J, Li G, Huang Q, Wu X (2016) Learning label-specific features and class-dependent labels for multilabel classification. IEEE Trans Knowl Data Eng 28:3309–3323. https://doi.org/10.1109/TKDE.2016.2608339

    Article  Google Scholar 

  24. Zhao T, Zhang Y, Miao D, Pedrycz W (2022) Selective label enhancement for multilabel classification based on three-way decisions. Int J Approximate Reason 150:172–187. https://doi.org/10.1016/j.ijar.2022.08.008

    Article  Google Scholar 

  25. Zhu X, Li J, Ren J, Wang J, Wang G (2023) Dynamic ensemble learning for multilabel classification. Inf Sci 623:94–111. https://doi.org/10.1016/j.ins.2022.12.022

    Article  Google Scholar 

  26. Li G, Wang J, Zheng Y, Franklin MJ (2016) Crowdsourced Data Management: a Survey. IEEE Trans Knowl Data Eng 28:2296–2319. https://doi.org/10.1109/TKDE.2016.2535242

    Article  Google Scholar 

  27. Tong Y, Zhou Z, Zeng Y, Chen L, Shahabi C (2020) Spatial crowdsourcing: a survey. VLDB J 29:217–250. https://doi.org/10.1007/s00778-019-00568-7

    Article  Google Scholar 

  28. Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad HR, Bertino E, Dustdar S (2013) Quality control in crowdsourcing systems: issues and directions. IEEE Internet Comput 17:76–81. https://doi.org/10.1109/MIC.2013.20

    Article  Google Scholar 

  29. Yadav A, Mishra S, Sairam AS (2022) A multi-objective worker selection scheme in crowdsourced platforms using NSGA-II. Expert Syst Appl 201:116991. https://doi.org/10.1016/j.eswa.2022.116991

    Article  Google Scholar 

  30. Wu G, Chen Z, Liu J, Han D, Qiao B (2021) Task assignment for social-oriented crowdsourcing. Front Comput Sci 15:152316. https://doi.org/10.1007/s11704-019-9119-8

    Article  Google Scholar 

  31. Abdullah NA, Rahman MM, Rahman MdM, Ghauth KI (2020) A Framework for optimal worker selection in spatial crowdsourcing using Bayesian network. IEEE Access 8:120218–120233. https://doi.org/10.1109/ACCESS.2020.3005543

    Article  Google Scholar 

  32. Hu Q, He Q, Huang H, Chiew K, Liu Z (2016) A formalized framework for incorporating expert labels in crowdsourcing environment. J Intell Inf Syst 47:403–425. https://doi.org/10.1007/s10844-015-0371-6

    Article  Google Scholar 

  33. Wang Y, Gao Y, Li Y, Tong X (2020) A worker-selection incentive mechanism for optimizing platform-centric mobile crowdsourcing systems. Comput Networks 171:107144. https://doi.org/10.1016/j.comnet.2020.107144

    Article  Google Scholar 

  34. Dang D, Liu Y, Zhang X, Huang S (2016) A crowdsourcing worker quality evaluation algorithm on mapreduce for big data applications. IEEE Trans Parallel Distrib Syst 27:1879–1888. https://doi.org/10.1109/TPDS.2015.2457924

    Article  Google Scholar 

  35. Fang Y, Sun H, Li G, Zhang R, Huai J (2018) Context-aware result inference in crowdsourcing. Inf Sci 460–461:346–363. https://doi.org/10.1016/j.ins.2018.05.050

    Article  Google Scholar 

  36. Yuen M-C, King I, Leung K-S (2021) Temporal context-aware task recommendation in crowdsourcing systems. Knowl Based Syst 219:106770. https://doi.org/10.1016/j.knosys.2021.106770

    Article  Google Scholar 

  37. Padmanabhan D, Bhat S, Shevade S, Narahari Y (2016) Topic Model Based Multilabel Classification. In: 2016 IEEE 28th international conference on tools with artificial intelligence (ICTAI). IEEE, pp 996–1003

  38. Davtyan M, Eickhoff C, Hofmann T (2015) Exploiting document content for efficient aggregation of crowdsourcing votes. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, New York, NY, USA, pp 783–790

  39. Zhang J, Wu M, Zhou C, Sheng VS (2022) Active crowdsourcing for multilabel annotation. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3194022

    Article  Google Scholar 

  40. Gui X, Lu X, Yu G (2021) Cost-effective batch-mode multilabel active learning. Neurocomputing 463:355–367

    Article  Google Scholar 

  41. Li S-Y, Jiang Y, Chawla NV, Zhou Z-H (2019) Multilabel Learning from Crowds. IEEE Trans Knowl Data Eng 31:1369–1382. https://doi.org/10.1109/TKDE.2018.2857766

    Article  Google Scholar 

  42. Chen Z, Jiang L, Li C (2022) Label augmented and weighted majority voting for crowdsourcing. Inf Sci 606:397–409. https://doi.org/10.1016/j.ins.2022.05.066

    Article  Google Scholar 

  43. Yu G, Tu J, Wang J, Domeniconi C, Zhang X (2021) Active multilabel crowd consensus. IEEE Trans Neural Netw Learn Syst 32:1448–1459. https://doi.org/10.1109/TNNLS.2020.2984729

    Article  Google Scholar 

  44. Adamska P, Juźwin M, Wierzbicki A (2020) Picking peaches or squeezing lemons: selecting crowdsourcing workers for reducing cost of redundancy. pp 510–523

  45. Haruna CR, Hou M, Eghan MJ, Kpiebaareh MY, Tandoh L (2019) An effective and cost-based framework for a qualitative hybrid data deduplication. pp 511–520

  46. Shen S, Ji M, Wu Z, Yang X (2022) An optimization approach for worker selection in crowdsourcing systems. Comput Ind Eng 173:108730. https://doi.org/10.1016/j.cie.2022.108730

    Article  Google Scholar 

  47. Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds. In: Proceedings of the 24th annual ACM symposium on User interface software and technology-UIST '11. ACM Press, New York, p 33

  48. Itoh Y, Matsubara S (2021) Adaptive budget allocation for cooperative task solving in crowdsourcing. In: 2021 IEEE international conference on big data (big data). IEEE, pp 3525–3533

  49. Gao H, Liu CH, Tang J, Yang D, Hui P, Wang W (2019) Online quality-aware incentive mechanism for mobile crowd sensing with extra bonus. IEEE Trans Mob Comput 18:2589–2603. https://doi.org/10.1109/TMC.2018.2877459

    Article  Google Scholar 

  50. Vazirani VV (2001) Approximation algorithms. Springer, Berlin

  51. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37:1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009

    Article  Google Scholar 

  52. Zhang M-L, Zhou Z-H (2007) ML-KNN: A lazy learning approach to multilabel learning. Pattern Recognit 40:2038–2048. https://doi.org/10.1016/j.patcog.2006.12.019

    Article  Google Scholar 

  53. Kim H-C, Ghahramani Z (2012) Bayesian classifier combination. In: Artificial Intelligence and Statistics. pp 619–627

  54. Kim H, Ghahramani Z (2003) The EM-EP algorithm for Gaussian process classification. In: Proceedings of the workshop on probabilistic graphical models for classification at ECML

  55. Kwok JT-Y (1999) Moderating the outputs of support vector machine classifiers. IEEE Trans Neural Netw 10:1018–1031. https://doi.org/10.1109/72.788642

    Article  Google Scholar 

  56. Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multilabel learning. Pattern Recognit 45:3084–3104. https://doi.org/10.1016/j.patcog.2012.03.004

    Article  Google Scholar 

  57. Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, volume 2. Wiley, Hoboken

Download references

Acknowledgements

Not Applicable.

Funding

For this research, the authors do not take any funding.

Author information

Authors and Affiliations

Authors

Contributions

Both authors contributed equally.

Corresponding author

Correspondence to Himanshu Suyal.

Ethics declarations

Conflict of interest

Conflict of interest on behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical approval

Not Applicable.

Consent to participate

Not Applicable.

Consent for publication

Submissions have not been previously published, and all co-authors agree to publish.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suyal, H., Singh, A. Multilabel classification using crowdsourcing under budget constraints. Knowl Inf Syst 66, 841–877 (2024). https://doi.org/10.1007/s10115-023-01973-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-023-01973-9

Keywords

Navigation