Multilabel classification using crowdsourcing under budget constraints

Suyal, Himanshu; Singh, Avtar

doi:10.1007/s10115-023-01973-9

Multilabel classification using crowdsourcing under budget constraints

Regular Paper
Published: 09 September 2023

Volume 66, pages 841–877, (2024)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

177 Accesses
Explore all metrics

Abstract

Multilabel classification has excelled in several distinct fields during the past few decades but still has significant limitations. One of the critical concerns is the lack or insufficient availability of label instances, and data labelling also needs time and budget, which is a challenge. Crowdsourcing overcomes the problem of label availability, yet, it has drawbacks such as label quality and budget limitations. The paper introduced the multilabel reverse auction framework to address the lack of crowd worker's issue. Each crowd worker must provide cost and confidence for each task for a specific domain. Furthermore, two methods for systematic budget selection are presented to address the insufficient domain coverage within the budget limitation: Greedy bid selection and Multi cover bid selection. Both approaches choose the most inexpensive crowd workers while considering worker expertise and domain coverage. Crowd version binary relevance and multilabel k-nearest neighbours are also introduced to support label aggregation and reduce low-quality workers' impact while considering the domain. An experimental study shows the effectiveness of our approach on seven multilabel datasets using diverse crowds. It delivers more than 16% improvement compared to the random selection with a majority voting baseline technique. The proposed method is compared against five benchmark algorithms and provides promising results when minimal availability of data and workers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminated by an algorithm: a systematic review of discrimination and fairness by algorithmic decision-making in the context of HR recruitment and HR development

Article Open access 20 November 2020

A survey on semi-supervised learning

Article Open access 15 November 2019

Learning from positive and unlabeled data: a survey

Article 02 April 2020

Data availability

All data generated or analyzed during this study are included in an extensive experimental comparison of methods for multilabel learning.

References

Howe J et al (2006) The rise of crowdsourcing. Wired magazine 14:1–4
Google Scholar
LaToza TD, van der Hoek A (2016) Crowdsourcing in software engineering: models, motivations, and challenges. IEEE Softw 33:74–80. https://doi.org/10.1109/MS.2016.12
Article Google Scholar
Lease M, Yilmaz E (2012) Crowdsourcing for information retrieval. ACM SIGIR Forum 45:66–75. https://doi.org/10.1145/2093346.2093356
Article Google Scholar
Muller CL, Chapman L, Johnston S, Kidd C, Illingworth S, Foody G, Overeem A, Leigh RR (2015) Crowdsourcing for climate and atmospheric sciences: current status and future potential. Int J Climatol 35:3185–3203. https://doi.org/10.1002/joc.4210
Article Google Scholar
Xu Z, Liu Y, Yen NY, Mei L, Luo X, Wei X, Hu C (2020) Crowdsourcing based description of urban emergency events using social media big data. IEEE Trans Cloud Comput 8:387–397. https://doi.org/10.1109/TCC.2016.2517638
Article Google Scholar
Mohammadzadeh H, Gharehchopogh FS (2021) A multi-agent system based for solving high-dimensional optimization problems: a case study on email spam detection. Int J Commun Syst. https://doi.org/10.1002/dac.4670
Article Google Scholar
Vuurens J, de Vries AP, Eickhoff C How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy
Zhong J, Tang K, Zhou Z-H Active Learning from Crowds with Unsure Option
Rubin TN, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multilabel document classification. Mach Learn 88:157–208. https://doi.org/10.1007/s10994-011-5272-5
Article MathSciNet Google Scholar
Gharehchopogh FS, Namazi M, Ebrahimi L, Abdollahzadeh B (2023) Advances in sparrow search algorithm: a comprehensive survey. Arch Comput Methods Eng 30:427–455. https://doi.org/10.1007/s11831-022-09804-w
Article Google Scholar
Gharehchopogh FS, Ucan A, Ibrikci T, Arasteh B, Isik G (2023) Slime mould algorithm: a comprehensive survey of its variants and applications. Arch Comput Methods Eng 30:2683–2723. https://doi.org/10.1007/s11831-023-09883-3
Article Google Scholar
Shen Y, Zhang C, Soleimanian Gharehchopogh F, Mirjalili S (2023) An improved whale optimization algorithm based on multi-population evolution for global optimization and engineering design problems. Expert Syst Appl 215:119269. https://doi.org/10.1016/j.eswa.2022.119269
Article Google Scholar
Suyal H, Singh A (2021) Improving multilabel classification in prototype selection scenario. Comput Intell Healthcare Inf 103–119
Rabby G, Berka P (2022) Multi-class classification of COVID-19 documents using machine learning algorithms. J Intell Inf Syst. https://doi.org/10.1007/s10844-022-00768-8
Article Google Scholar
Lo H-Y, Wang J-C, Wang H-M, Lin S-D (2011) Cost-sensitive multilabel learning for audio tag annotation and retrieval. IEEE Trans Multimedia 13:518–529. https://doi.org/10.1109/TMM.2011.2129498
Article Google Scholar
Gharehchopogh FS (2023) An improved Harris Hawks optimization algorithm with multi-strategy for community detection in social network. J Bionic Eng 20:1175–1197. https://doi.org/10.1007/s42235-022-00303-z
Article Google Scholar
Tsoumakas G, Katakis I (2007) Multi-label classification. Int J Data Warehouse Min 3:1–13. https://doi.org/10.4018/jdwm.2007070101
Article Google Scholar
Lughofer E (2022) Evolving multilabel fuzzy classifier. Inf Sci 597:1–23. https://doi.org/10.1016/j.ins.2022.03.045
Article MathSciNet Google Scholar
Mishra NK, Singh PK (2022) Linear ordering problem based classifier chain using genetic algorithm for multilabel classification. Appl Soft Comput 117:108395
Article Google Scholar
Loza Mencía E, Park S-H, Fürnkranz J (2010) Efficient voting prediction for pairwise multilabel classification. Neurocomputing 73:1164–1176. https://doi.org/10.1016/j.neucom.2009.11.024
Article Google Scholar
Trohidis K, Tsoumakas G, Kalliris G, Vlahavas I (2011) Multilabel classification of music by emotion. EURASIP J Audio Speech Music Process 2011:4. https://doi.org/10.1186/1687-4722-2011-426793
Article Google Scholar
Yap XH, Raymer M (2021) Multilabel classification and label dependence in in silico toxicity prediction. Toxicol Vitro 74:105157. https://doi.org/10.1016/j.tiv.2021.105157
Article Google Scholar
Huang J, Li G, Huang Q, Wu X (2016) Learning label-specific features and class-dependent labels for multilabel classification. IEEE Trans Knowl Data Eng 28:3309–3323. https://doi.org/10.1109/TKDE.2016.2608339
Article Google Scholar
Zhao T, Zhang Y, Miao D, Pedrycz W (2022) Selective label enhancement for multilabel classification based on three-way decisions. Int J Approximate Reason 150:172–187. https://doi.org/10.1016/j.ijar.2022.08.008
Article Google Scholar
Zhu X, Li J, Ren J, Wang J, Wang G (2023) Dynamic ensemble learning for multilabel classification. Inf Sci 623:94–111. https://doi.org/10.1016/j.ins.2022.12.022
Article Google Scholar
Li G, Wang J, Zheng Y, Franklin MJ (2016) Crowdsourced Data Management: a Survey. IEEE Trans Knowl Data Eng 28:2296–2319. https://doi.org/10.1109/TKDE.2016.2535242
Article Google Scholar
Tong Y, Zhou Z, Zeng Y, Chen L, Shahabi C (2020) Spatial crowdsourcing: a survey. VLDB J 29:217–250. https://doi.org/10.1007/s00778-019-00568-7
Article Google Scholar
Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad HR, Bertino E, Dustdar S (2013) Quality control in crowdsourcing systems: issues and directions. IEEE Internet Comput 17:76–81. https://doi.org/10.1109/MIC.2013.20
Article Google Scholar
Yadav A, Mishra S, Sairam AS (2022) A multi-objective worker selection scheme in crowdsourced platforms using NSGA-II. Expert Syst Appl 201:116991. https://doi.org/10.1016/j.eswa.2022.116991
Article Google Scholar
Wu G, Chen Z, Liu J, Han D, Qiao B (2021) Task assignment for social-oriented crowdsourcing. Front Comput Sci 15:152316. https://doi.org/10.1007/s11704-019-9119-8
Article Google Scholar
Abdullah NA, Rahman MM, Rahman MdM, Ghauth KI (2020) A Framework for optimal worker selection in spatial crowdsourcing using Bayesian network. IEEE Access 8:120218–120233. https://doi.org/10.1109/ACCESS.2020.3005543
Article Google Scholar
Hu Q, He Q, Huang H, Chiew K, Liu Z (2016) A formalized framework for incorporating expert labels in crowdsourcing environment. J Intell Inf Syst 47:403–425. https://doi.org/10.1007/s10844-015-0371-6
Article Google Scholar
Wang Y, Gao Y, Li Y, Tong X (2020) A worker-selection incentive mechanism for optimizing platform-centric mobile crowdsourcing systems. Comput Networks 171:107144. https://doi.org/10.1016/j.comnet.2020.107144
Article Google Scholar
Dang D, Liu Y, Zhang X, Huang S (2016) A crowdsourcing worker quality evaluation algorithm on mapreduce for big data applications. IEEE Trans Parallel Distrib Syst 27:1879–1888. https://doi.org/10.1109/TPDS.2015.2457924
Article Google Scholar
Fang Y, Sun H, Li G, Zhang R, Huai J (2018) Context-aware result inference in crowdsourcing. Inf Sci 460–461:346–363. https://doi.org/10.1016/j.ins.2018.05.050
Article Google Scholar
Yuen M-C, King I, Leung K-S (2021) Temporal context-aware task recommendation in crowdsourcing systems. Knowl Based Syst 219:106770. https://doi.org/10.1016/j.knosys.2021.106770
Article Google Scholar
Padmanabhan D, Bhat S, Shevade S, Narahari Y (2016) Topic Model Based Multilabel Classification. In: 2016 IEEE 28th international conference on tools with artificial intelligence (ICTAI). IEEE, pp 996–1003
Davtyan M, Eickhoff C, Hofmann T (2015) Exploiting document content for efficient aggregation of crowdsourcing votes. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, New York, NY, USA, pp 783–790
Zhang J, Wu M, Zhou C, Sheng VS (2022) Active crowdsourcing for multilabel annotation. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2022.3194022
Article Google Scholar
Gui X, Lu X, Yu G (2021) Cost-effective batch-mode multilabel active learning. Neurocomputing 463:355–367
Article Google Scholar
Li S-Y, Jiang Y, Chawla NV, Zhou Z-H (2019) Multilabel Learning from Crowds. IEEE Trans Knowl Data Eng 31:1369–1382. https://doi.org/10.1109/TKDE.2018.2857766
Article Google Scholar
Chen Z, Jiang L, Li C (2022) Label augmented and weighted majority voting for crowdsourcing. Inf Sci 606:397–409. https://doi.org/10.1016/j.ins.2022.05.066
Article Google Scholar
Yu G, Tu J, Wang J, Domeniconi C, Zhang X (2021) Active multilabel crowd consensus. IEEE Trans Neural Netw Learn Syst 32:1448–1459. https://doi.org/10.1109/TNNLS.2020.2984729
Article Google Scholar
Adamska P, Juźwin M, Wierzbicki A (2020) Picking peaches or squeezing lemons: selecting crowdsourcing workers for reducing cost of redundancy. pp 510–523
Haruna CR, Hou M, Eghan MJ, Kpiebaareh MY, Tandoh L (2019) An effective and cost-based framework for a qualitative hybrid data deduplication. pp 511–520
Shen S, Ji M, Wu Z, Yang X (2022) An optimization approach for worker selection in crowdsourcing systems. Comput Ind Eng 173:108730. https://doi.org/10.1016/j.cie.2022.108730
Article Google Scholar
Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds. In: Proceedings of the 24th annual ACM symposium on User interface software and technology-UIST '11. ACM Press, New York, p 33
Itoh Y, Matsubara S (2021) Adaptive budget allocation for cooperative task solving in crowdsourcing. In: 2021 IEEE international conference on big data (big data). IEEE, pp 3525–3533
Gao H, Liu CH, Tang J, Yang D, Hui P, Wang W (2019) Online quality-aware incentive mechanism for mobile crowd sensing with extra bonus. IEEE Trans Mob Comput 18:2589–2603. https://doi.org/10.1109/TMC.2018.2877459
Article Google Scholar
Vazirani VV (2001) Approximation algorithms. Springer, Berlin
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37:1757–1771. https://doi.org/10.1016/j.patcog.2004.03.009
Article Google Scholar
Zhang M-L, Zhou Z-H (2007) ML-KNN: A lazy learning approach to multilabel learning. Pattern Recognit 40:2038–2048. https://doi.org/10.1016/j.patcog.2006.12.019
Article Google Scholar
Kim H-C, Ghahramani Z (2012) Bayesian classifier combination. In: Artificial Intelligence and Statistics. pp 619–627
Kim H, Ghahramani Z (2003) The EM-EP algorithm for Gaussian process classification. In: Proceedings of the workshop on probabilistic graphical models for classification at ECML
Kwok JT-Y (1999) Moderating the outputs of support vector machine classifiers. IEEE Trans Neural Netw 10:1018–1031. https://doi.org/10.1109/72.788642
Article Google Scholar
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multilabel learning. Pattern Recognit 45:3084–3104. https://doi.org/10.1016/j.patcog.2012.03.004
Article Google Scholar
Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, volume 2. Wiley, Hoboken

Download references

Acknowledgements

Not Applicable.

Funding

For this research, the authors do not take any funding.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Dr B R Ambedkar National Institute of Technology, Jalandhar, Punjab, 144011, India
Himanshu Suyal & Avtar Singh

Authors

Himanshu Suyal
View author publications
You can also search for this author in PubMed Google Scholar
Avtar Singh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors contributed equally.

Corresponding author

Correspondence to Himanshu Suyal.

Ethics declarations

Conflict of interest

Conflict of interest on behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical approval

Not Applicable.

Consent to participate

Not Applicable.

Consent for publication

Submissions have not been previously published, and all co-authors agree to publish.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Suyal, H., Singh, A. Multilabel classification using crowdsourcing under budget constraints. Knowl Inf Syst 66, 841–877 (2024). https://doi.org/10.1007/s10115-023-01973-9

Download citation

Received: 18 January 2023
Revised: 07 August 2023
Accepted: 17 August 2023
Published: 09 September 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s10115-023-01973-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilabel classification using crowdsourcing under budget constraints

Abstract

Access this article

Similar content being viewed by others

Discriminated by an algorithm: a systematic review of discrimination and fairness by algorithmic decision-making in the context of HR recruitment and HR development

A survey on semi-supervised learning

Learning from positive and unlabeled data: a survey

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multilabel classification using crowdsourcing under budget constraints

Abstract

Access this article

Similar content being viewed by others

Discriminated by an algorithm: a systematic review of discrimination and fairness by algorithmic decision-making in the context of HR recruitment and HR development

A survey on semi-supervised learning

Learning from positive and unlabeled data: a survey

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation