Abstract
Reading comprehension involves the process of reading and understanding textual information in order to answer questions related to it. It finds practical applications in various domains such as domain-specific FAQs, search engines, and dialog systems. Resource-rich languages like English, Japanese, Chinese, and most European languages benefit from the availability of numerous datasets and resources, enabling the development of machine reading comprehension (MRC) systems. However, building MRC systems for low-resource languages (LRL) with limited datasets, such as Vietnamese, Urdu, Bengali, and Hindi, poses significant challenges. To address this issue, this study utilizes quantitative analysis to conduct a systematic literature review (SLR) with the aim of comprehending the recent global shift in MRC research from high-resource languages (HRL) to low-resource languages. Notably, existing literature reviews on MRC lack comprehensive studies that compare techniques specifically designed for rich and low-resource languages. Hence, this study provides a comprehensive overview of the MRC research landscape in low-resource languages, offering valuable insights and a list of suggestions to enhance LRL–MRC research.
Similar content being viewed by others
Abbreviations
- MRC:
-
Machine reading comprehension
- QA:
-
Question answering
- RC:
-
Reading comprehension
- NLP:
-
Natural language processing
- NER:
-
Named entity recognition
- LSTM:
-
Long short-term memory
- CNN:
-
Convolutional neural network
- RNN:
-
Recurrent neural network
- BERT:
-
Bidirectional encoder representations from transformers
- GPT:
-
Generative pre-trained transformer
- XLNet:
-
EXtreme multi-task learning network
- ALBERT:
-
A lite BERT
- SQuAD:
-
Stanford question answering dataset
- MSMARCO:
-
Microsoft machine reading comprehension dataset
- CoQA:
-
Conversational question answering
- RACE:
-
ReAding comprehension from examinations
- GLUE:
-
General language understanding evaluation
- F1:
-
F1 score (a common metric for evaluation)
- BLEU:
-
Bilingual evaluation understudy (a metric for machine translation)
- EM:
-
Exact match (a metric for evaluation)
- POS:
-
Part-of-speech
- IDF:
-
Inverse document frequency
- CRF:
-
Conditional random field
- PTM:
-
Pre-trained model
- SP:
-
Span prediction
- SD:
-
Synthetic data
- Cxt:
-
Context
- E2E:
-
End-to-end
- OOV:
-
Out-of-vocabulary
- WSD:
-
Word sense disambiguation
- ZSL:
-
Zero shot learning
- HEQ:
-
Human equivalencen scorec
- METEOR:
-
Metric for evaluation of translation with explicit ORdering
References
Abadani N, Mozafari J, Fatemi A, Nematbakhsh M, Kazemi A (2021) Parsquad: persian question answering dataset based on machine translation of squad 2.0. Int J Web Res 4(1):34–46
Abedissa T, Usbeck R, Assabie Y (2023) Amqa: amharic question answering dataset. arXiv preprint arXiv:2303.03290
Andrus BR, Nasiri Y, Cui S, Cullen B, Fulda N (2022) Enhanced story comprehension for large language models through dynamic document-based knowledge graphs. Proc AAAI Conf Artif Intell 36:10436–10444
Anuranjana K, Rao V, Mamidi R (2019) Hindirc: a dataset for reading comprehension in Hindi. In: 0th International Conference on Computational Linguistics and Intelligent Text
Artetxe M, Ruder S, Yogatama D (2020) On the cross-lingual transferability of monolingual representations. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 4623–4637
Asai A, Eriguchi A, Hashimoto K, Tsuruoka Y (2018) Multilingual extractive reading comprehension by runtime machine translation. arXiv preprint arXiv:1809.03275
Atef A, Mattar B, Sherif S, Elrefai E, Torki M (2020) Aqad: 17,000+ arabic questions for machine comprehension of text. In: 2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA), pp 1–6. IEEE
Bajgar O, Kadlec R, Kleindienst J (2016) Embracing data abundance: boktest dataset for reading comprehension. arXiv preprint arXiv:1610.00956
Banerjee S, Lavie A (2005) Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization
Baradaran R, Amirkhani H (2021) Ensemble learning-based approach for improving generalization capability of machine reading comprehension systems. Neurocomputing 466:229–242
Baradaran R, Razieh G, Amirkhani H (2020) A survey on machine reading comprehension systems. Nat Language Eng 5:1–50
Béchet F, Aloui C, Charlet D, Damnati G, Heinecke J, Nasr A, Herledan F (2019) CALOR-QUEST : generating a training corpus for Machine Reading Comprehension models from shallow semantic annotations. In: MRQA: machine Reading for Question Answering-Workshop at EMNLP-IJCNLP 2019-2019 Conference on Empirical Methods in Natural Language Processing, Hong Kong
Berckmann T, Hiziroglu B (2020) Low-resource translation as language modeling. In: Proceedings of the Fifth Conference on Machine Translation, pp 1079–1083
Bhakthavatsalam S, Khashabi D, Khot T, Mishra BD, Richardson K, Sabharwal A, Schoenick C, Tafjord O, Clark P (2021) Think you have solved direct-answer question answering? try arc-da, the direct-answer ai2 reasoning challenge. arXiv preprint arXiv:2102.03315
Bjerva J, Bhutani N, Golshan B, Tan W-C, Augenstein I (2020) Subjqa: A dataset for subjectivity and review comprehension. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 5480–5494
Brunato D, Valeriani M, Dell’Orletta F (2018) Darc-it: a dataset for reading comprehension in Italian. Comput Linguist CLiC-it 2018 8:62
Carrino CP, Costa-Jussà MR, Rodríguez F José A (2020) Automatic Spanish translation of squad dataset for multi-lingual question answering. In: LREC 2020: 12th International Conference on Language Resources and Evaluation: Marseílle: May 13–15, 2020: conference proceedings, pp 5515–5523. European Language Resources Association (ELRA)
Chandu K, Loginova E, Gupta V, van Genabith J, Neumann G, Chinnakotla M, Nyberg E, Black AW (2019) Code-mixed question answering challenge: Crowd-sourcing data and techniques. In: Third Workshop on Computational Approaches to Linguistic Code-Switching, pp 29–38. Association for Computational Linguistics (ACL)
Charniak E, Altun Y, de Salvo BR, Garrett B, Kosmala M, Moscovich T, Pang L, Pyo C, Sun Y, Wy W, et al (2000) Reading comprehension programs in a statistical-language-processing class. In: ANLP-NAACL 2000 Workshop: Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems
Chaudhury A, Tapaswi M, Kim SW, Fidler S (2019) The shmoop corpus: a dataset of stories with loosely aligned summaries. arXiv preprint arXiv:1912.13082
Chen N, Shou L, Gong M, Pei J (2022) From good to best: two-stage training for cross-lingual machine reading comprehension. Proc AAAI Conf Artif Intell 36:10501–10508
Chen D, Bolton J, Manning CD (2016) A thorough examination of the cnn/daily mail reading comprehension task. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2358–2367
Chen M, D’Arcy M, Liu A, Fernandez J, Downey D (2019) Codah: An adversarially-authored question answering dataset for common sense. In: Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP, pp 63–69
Chen D, Fisch A, Weston J Bordes A (2017) Reading wikipedia to answer open-domain questions. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp1870–1879
Choi E, He H, Iyyer M, Yatskar M, Yih WT, Choi Y, Liang P, Zettlemoyer L (2018) Quac: Question answering in context. In: 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pp 2174–2184. Association for Computational Linguistics
Chung Y-A, Lee H-Y, Glass J (2018) Supervised and unsupervised transfer learning for question answering. In: Proceedings of NAACL-HLT
Clark JH, Choi E, Collins M, Garrette D, Kwiatkowsk T, Nikolaev V, Palomaki J (2020) Tydi qa: a benchmark for information-seeking question answering in typologically diverse languages. Trans Assoc Comput Linguist 8:454–4708
Clark C, Lee K, Chang M-W, Kwiatkowski T, Collins M, Toutanova K (2019) Boolq: exploring the surprising difficulty of natural yes/no questions. In Proceedings of NAACL-HLT, pp 2924–2936
Croce D, Zelenanska A, Basili R (2018) Neural learning for question answering in Italian. In: Ghidini C, Magnini B, Passerini A, Traverso P (eds) AI*IA 2018—advances in artificial intelligence. Springer International Publishing, Cham, pp 389–402
Cui Y, Che W, Liu T, Qin B, Wang S, Hu G (2019) Cross-lingual machine reading comprehension. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 1586–1595
Cui Y, Liu T, Che W, Xiao L, Chen Z, Ma W, Wang S, Hu G (2019) A span-extraction dataset for Chinese machine reading comprehension. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 5883–5889
Darvishi K, Shahbodaghkhan N, Abbasiantaeb Z, Momtazi S (2023) Pquad: a persian question answering dataset. Comput Speech Language 80:101486
Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Arti Intell Rev 47(3):279–311
David M, Larissa S, Mike C, Davina G, Alessandro L, Mark P, Paul S, Stewart Lesley A (2015) Preferred reporting items for systematic review and meta-analysis protocols (prisma-p) 2015 statement. Syst Rev 4(1):1–9
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp 4171–4186, Minneapolis, Minnesota, Association for Computational Linguistics
Dhingra B, Mazaitis K, Cohen WW (2017) Quasar: Datasets for question answering by search and reading. arXiv preprint arXiv:1707.03904
d’Hoffschmidt M, Belblidia W, Heinrich Q, Brendlé T, Vidal M (2020) Fquad: French question answering dataset. Find Assoc Comput Linguist 2020:1193–1208
Dunn M, Sagun L, Higgins M, Guney VU, Cirik V, Cho K (2017) Searchqa: a new q &a dataset augmented with context from a search engine. arXiv preprint arXiv:1704.05179
Duong L (2017) Natural language processing for resource-poor languages. University of Melbourne, Parkville
Dzendzik D, Foster J, Vogel C (2021) English machine reading comprehension datasets: A survey. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 8784–8804
Efimov P, Chertok A, Boytsov L, Braslavski P (2020) Sberquad-Russian reading comprehension dataset: Description and analysis. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp 3–15. Springer
Elsahar H, Vougiouklis P, Remaci A, Gravier C, Hare J, Laforest F, Simperl E (2018) T-rex: a large scale alignment of natural language with knowledge base triples. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Fenogenova A, Mikhailov V, Shevelev D (2020) Read and reason with Muserc and Rucos: Datasets for machine reading comprehension for Russian. In: Proceedings of the 28th International Conference on Computational Linguistics
Gardner M, Berant J, Hajishirzi H, Talmor A, Min S (2019) On making reading comprehension more comprehensive. In: Proceedings of the 2nd Workshop on Machine Reading for Question Answering, pp 105–112
Gashkov A, Perevalov A, Eltsova M, Both A (2021) Improving the question answering quality using answer candidate filtering based on natural-language features. In: 2021 16th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), pp 635–642. IEEE
Ghaddar A, Wu Y, Bagga S, Rashid A, Bibi K, Rezagholizadeh M, Xing C, Wang Y, Xinyu D, Wang Z, et al (2021) Jaber and Saber: Junior and senior Arabic Bert. arXiv preprint arXiv:2205.10687
Glushkova T, Machnev A, Fenogenova A, Shavrina T, Artemova E, Ignatov DI (2020) Danetqa: a yes/no question answering dataset for the Russian language. In: International Conference on Analysis of Images, Social Networks and Texts, pp 57–68. Springer
Grail Q, Perez J (2018) Reviewqa: a relational aspect-based opinion reading dataset. arXiv preprint arXiv:1810.12196
Greenhill SJ, Atkinson QD, Meade A, Gray RD (2010) The shape and tempo of language evolution. Proc R Soc 277(1693):2443–2450
Guo S, Guan YH, Tan RL (2021) Frame-based neural network for machine reading comprehension. Knowl-Based Syst 219:106889
Gupta S, Khade N (2020) Bert based multilingual machine comprehension in English and Hindi. ACM Trans Asian Low-Resour Lang Inf Process 9(1):8
Gupta D, Ekbal A, Bhattacharyya P (2019) A deep neural network framework for English Hindi question answering. ACM Trans Asian Low-Resour Lang Inf Process 19(2):8
Gupta D, Kumari S, Ekbal A, Bhattacharyya P (2018) Mmqa: A multi-domain multi-lingual question-answering framework for English and Hindi. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Gupta S, Pratap SRB, Yu H (2020) Conversational machine comprehension: a literature review. In: Proceedings of the 28th International Conference on Computational Linguistics, pp 2739–2753, Barcelona. International Committee on Computational Linguistics
Haddow B, Bawden R, Barone AVM, Helcl J, Birch A (2022) Survey of low-resource machine translation. Comput Linguist 48(3):673–732
Hardalov M, Koychev I, Nakov P (2019) Beyond english-only reading comprehension: Experiments in zero-shot multilingual transfer for bulgarian. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp 447–459
Haurilet M, Al-Halah Z, Stiefelhagen R (2018) Moqa-a multi-modal question answering architecture. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops
He W, Liu K, Liu J, Lyu Y, Zhao S, Xiao X, Liu Y, Wang Y, Wu H, She Q, et al (2018) Dureader: a Chinese machine reading comprehension dataset from real-world applications. In: Proceedings of the Workshop on Machine Reading for Question Answering, pp 37–46
Hermann KM, Kocisky T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. Adv Neural Inform Process Syst 28:9
Hill F, Bordes A, Chopra S, Weston J (2016) The goldilocks principle: reading children’s books with explicit memory representations. In: 4th International Conference on Learning Representations, ICLR 2016
Hirschman L, Light M, Breck E, Burger JD (1999) Deep read: a reading comprehension system. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics, pp 325–332
Huang Z, Liu F, Xian W, Ge S, Wang H, Fan W, Zou Y (2021) Audio-oriented multimodal machine comprehension via dynamic inter-and intra-modality attention. Proc AAAI Conf Artif Intell 35:13098–13106
Jia R, Liang P (2017) Adversarial examples for evaluating reading comprehension systems. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp 2021–2031
Jiang Y, Wu S, Gong J, Cheng Y, Meng P, Lin W, Chen Z, et al (2020) Improving machine reading comprehension with single-choice decision and transfer learning. arXiv preprint arXiv:2011.03292
Jing Y, Xiong D (2020) Effective strategies for low-resource reading comprehension. In: 2020 International Conference on Asian Language Processing (IALP), pp 153–157. IEEE
Jing Y, Xiong D, Yan Z (2019) BiPaR: a bilingual parallel dataset for multilingual and cross-lingual reading comprehension on novels. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 2452–2462, Hong Kong, Association for Computational Linguistics
Jin W, Yang G, Zhu H (2019) An efficient machine reading comprehension method based on attention mechanism. In: 2019 IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)
Joshi M, Choi E, Weld DS, Zettlemoyer L (2017) Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1601–1611
Joshi N, Darbari H, Mathur I (2012). Human and automatic evaluation of English to Hindi machine translation systems. In: Advances in Computer Science, Engineering & Applications, pp 423–432. Springer
Ju Y, Zhang Y, Tian Z, Liu K, Cao X, Zhao W, Li J, Zhao J (2021) Enhancing multiple-choice machine reading comprehension by punishing illogical interpretations. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 3641–3652
Karakanta A, Dehdari J, van G J (2018) Neural machine translation for low-resource languages without parallel corpora. Mach Transl 32(1):167–189
Kazi S, Khoja S (2021) Uquad1. 0: development of an urdu question answering training data for machine reading comprehension. arXiv preprint arXiv:2111.01543
Keele S et al (2007) Guidelines for performing systematic literature reviews in software engineering
Kembhavi A, Seo M, Schwenk D, Choi J, Farhadi A, Hajishirzi H (2017) Are you smarter than a sixth grader? textbook question answering for multimodal machine comprehension. In: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, pp 4999–5007
Khashabi D, Chaturvedi S, Roth M, Upadhyay S, Roth D (2018) Looking beyond the surface: A challenge set for reading comprehension over multiple sentences. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 252–262
Kitchenham B (2004) Procedures for performing systematic reviews. Keele UK Keele Univ 33(2004):1–26
Kurihara K, Kawahara D, Shibata T (2022) Jglue: Japanese general language understanding evaluation. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp 2957–2966
Kwiatkowski T, Palomaki J, Redfield O, Collins M, Parikh A, Alberti Chris, Epstein Danielle, Polosukhin Illia, Devlin Jacob, Lee Kenton et al (2019) Natural questions: a benchmark for question answering research. Trans Assoc Comput Linguist 7:453–466
Lai G, Xie Q, Liu H, Yang Y, Hovy E (2017) Race: large-scale reading comprehension dataset from examinations. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp 785–794
Lai Y, Zhang C, Feng Y, Huang Q, Zhao D (2021) Why machine reading comprehension models learn shortcuts? In: ACL/IJCNLP (Findings)
Lee Hyeon-Gu, Jang Youngjin, Kim Harksoo (2021) Machine reading comprehension framework based on self-training for domain adaptation. IEEE Access 9:21279–21285
Lee K, Park S, Han H, Yeo J, Hwang SW, Lee J (2019) Learning with limited data for multilingual reading comprehension. In: 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, pp 2840–2850. Association for Computational Linguistics
Lehnert WG (1977) The process of question answering. Yale University, New Haven
Lewis P, Oguz B, Rinott R, Riedel S, Schwenk H (2020) Mlqa: Evaluating cross-lingual extractive question answering. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 7315–7330, 2020
Li Y, Li H, Liu J (2019) Towards robust neural machine reading comprehension via question paraphrases. In: 2019 International Conference on Asian Language Processing (IALP), pp 290–295. IEEE
Li J, Liu M, Zheng Z, Zhang H, Qin B, Kan M-Y, Liu T (2021) Dadgraph: a discourse-aware dialogue graph neural network for multiparty dialogue machine reading comprehension. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp 1–8. IEEE
Lim S, Kim M, Lee J (2019) Korquad1. 0: Korean qa dataset for machine reading comprehension. arXiv preprint arXiv:1909.07005
Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Liu S, Zhang X, Zhang S, Wang H, Zhang W (2019) Neural machine reading comprehension: methods and trends. Appl Sci 9(18):3698
Liu K, Liu X, Yang A, Liu J, Jinsong S, Li S, She Q (2020) A robust adversarial training approach to machine reading comprehension. Proc AAAI Conf Artif Intell 34:8392–8400
Liu J, Chen Y, Jinan X (2022) Mrcaug: data augmentation via machine reading comprehension for document-level event argument extraction. IEEE/ACM Trans Audio Speech Language Process 30:3160–3172
Liu X, He P, Chen W, Gao J (2019) Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 4487–4496
Liu J, Lin Y, Liu Z, Sun M (2019) Xqa: A cross-lingual open-domain question answering dataset. In : Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 2358–2368
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized Bert pretraining approach. arXiv preprint arXiv:1907.11692
Liu X, Shen Y, Duh K, Gao J (2018) Stochastic answer networks for machine reading comprehension. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), p 1694–1704
Liu J, Shou L, Pei J, Gong M, Yang M, Jiang D (2020) Cross-lingual machine reading comprehension with language branch knowledge distillation. In: Proceedings of the 28th International Conference on Computational Linguistics, pp 2710–2721
Luo D, Zhang P, Ma L, Zhu X, Zhou M, Liang Q, Wang B, Wang L (2021) Evidence augment for multiple-choice machine reading comprehension by weak supervision. In: International Conference on Artificial Neural Networks, pp 357–368. Springer
Macková K, Straka M (2020) Reading comprehension in czech via machine translation and cross-lingual transfer. In: International Conference on Text, Speech, and Dialogue, pp 171–179. Springer
Meurers D, Ziai R, Ott N, Kopp J (2011) Evaluating answers to reading comprehension questions in context: Results for German and the role of information structure. In: Proceedings of the TextInfer 2011 Workshop on Textual Entailment, pp 1–9
Minghao H, Wei F, Peng Y, Huang Z, Yang N, Li D (2019) Read+ verify: machine reading comprehension with unanswerable questions. Proc AAAI Conf Artif Intell 33:6529–6537
Mubarak A, Imam A, Maaz A, Alexander G (2020) Methods and trends of machine reading comprehension in the Arabic language. Computación y Sistemas 24(4):1607–1615
Narasimhan K, Barzilay R (2015) Machine comprehension with discourse relations. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 1253–1262
Nguyen CT, Nguyen DT (2021) A vietnamese answer extraction model based on phobert. In: 2021 15th International Conference on Advanced Computing and Applications (ACOMP), pp 112–119. IEEE
Nguyen HD, Huynh T, Hoang S, Pham VT, Zelinka I (2020) Language-oriented sentiment analysis based on the grammar structure and improved self-attention network. In: ENASE, pp 339–346
Nguyen KV, Do PN, Nguyen ND, Huynh TV, Nguyen AG, Nguyen NL (2021) Sentence extraction-based machine reading comprehension for vietnamese. In: International Conference on Knowledge Science, Engineering and Management, pp 511–523. Springer
Nguyen K, Nguyen V, Nguyen A, Nguyen N (2020) A vietnamese dataset for evaluating machine reading comprehension. In Proceedings of the 28th International Conference on Computational Linguistics, pp 2595–2605
Nguyen T, Rosenberg M, Song X, Gao J, Tiwary S, Majumder R, Deng L (2016) Ms marco: a human generated machine reading comprehension dataset. In: CoCo@ NIPs
Nishida Kyosuke, Saito Itsumi, Otsuka Atsushi, Asano Hisako, Tomita Junji (2018) Retrieve-and-read: Multi-task learning of information retrieval and reading comprehension. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 647–656
Pampari A, Raghavan P, Liang J, Peng J (2018) emrqa: a large corpus for question answering on electronic medical records. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp 2357–2368
Pang RY, Parrish A, Joshi N, Nangia N, Phang J, Chen A, Padmakumar V, Ma J, Thompson J, He H, et al (2022) Quality: Question answering with long input texts, yes! In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5336–5358
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318
Paschoal AFA, Pirozelli P, Freire V, Delgado KV, Peres SM, José MM, Nakasato F, Oliveira AS, Brandão AAF, Costa AHR, et al (2021) Pirá: A bilingual portuguese-english dataset for question-answering about the ocean. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp 4544–4553
Pedroza M, Ramírez-Bello A, Becerra AG, Martínez FAF (2021) Machine reading comprehension (lstm) review (state of art). In: Metaheuristics in Machine Learning: Theory and Applications, pp 491–514. Springer
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. In: Proceedings of NAACL-HLT, pp 2227–2237
Putri RA, Oh AH (2022) Idk-mrc: unanswerable questions for Indonesian machine reading comprehension. In: The 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022. EMNLP
Qiu B, Chen X, Xu J, Sun Y (2019) A survey on neural machine reading comprehension. arXiv preprint arXiv:1906.03824
Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: Unanswerable questions for squad. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (volume 2: Short Papers), pp 784–789
Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp 2383–2392
Ranathunga S, Lee ES, Prifti Skenduli M, Shekhar R, Alam M, Kaur R (2021) Neural machine translation for low-resource languages: a survey. CM Comput Surv 55(11):1–37
Ravva P, Urlana A, Shrivastava M (2020) Avadhan: system for open-domain telugu question answering. In: Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, pp 234–238
Ren Q, Cheng X, Sen S (2020) Multi-task learning with generative adversarial training for multi-passage machine reading comprehension. Proc AAAI Conf Artif Intell 34:8705–8712
Richardson M, Burges CJC, Renshaw E (2013) Mctest: A challenge dataset for the open-domain machine comprehension of text. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 193–203
Riloff E, Thelen M (2000) A rule-based question answering system for reading comprehension tests. In: ANLP-NAACL 2000 Workshop: Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems
Rouge LCY (2004) A package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization of ACL, Spain
Seelawi H, Tuffaha I, Gzawi M, Farhan W, Talafha B, Badawi R, Sober Z, Al-Dweik O, Freihat AA, Al-Natsheh H (2021) Alue: Arabic language understanding evaluation. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp 173–184
Seo M, Kembhavi A, Farhadi A, Hajishirzi H (2016) Bidirectional attention flow for machine comprehension. In: International Conference on Learning Representations
Shao CC, Liu T, Lai Y, Tseng Y, Tsai S (2018) Drcd: a Chinese machine reading comprehension dataset. arXiv preprint arXiv:1806.00920
Shelke BA, Mahender CN (2022) Development of question answering system in marathi language. Specialusis Ugdymas 1(43):10176–10185
Smith E, Greco N, Bosnjak M, Vlachos A (2015) A strong lexical matching method for the machine comprehension test. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 1693–1698
Soni S, Roberts K (2020) Evaluation of dataset selection for pre-training and fine-tuning transformer language models for clinical question answering. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp 5532–5538
Soygazi F, Çiftçi O, Kök U, Cengiz S (2021) Thquad: Turkish historic question answering dataset for reading comprehension. In: 2021 6th International Conference on Computer Science and Engineering (UBMK), pp 215–220. IEEE, 2021
Sugawara S, Kido Y, Yokono H, Aizawa A (2017) Evaluation metrics for machine reading comprehension: Prerequisite skills and readability. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 806–817
Sun K (2021) Machine reading comprehension: challenges and approaches. PhD thesis, Cornell University
Sun K, Dian Y, Chen J, Dong Y, Choi Y, Cardie C (2019) Dream: a challenge data set and models for dialogue-based reading comprehension. Trans Assoc Comput Linguist 7:217–231
Sun Y, Liu S, Dan Z, Zhao X (2022) Question generation based on grammar knowledge and fine-grained classification. In: Proceedings of the 29th International Conference on Computational Linguistics, pp 6457–6467
Suster S, Daelemans W (2018) Clicr: a dataset of clinical case reports for machine reading comprehension. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp 1551–1563
Tahsin MT, Sarwar A, Rahman RM (2021) Deep learning based question answering system in Bengali. J Inform Telecommun 5(2):145–178
Tan C, Wei F, Yang N, Du B, Lv W, Zhou M (2018) S-net: From answer extraction to answer synthesis for machine reading comprehension. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 32
Tapaswi M, Zhu Y, Stiefelhagen R, Torralba A, Urtasun R, Fidler S (2016) Movieqa: understanding stories in movies through question-answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4631–4640
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(7):5
Thompson Cynthia (2003) Acquiring word-meaning mappings for natural language interfaces. J Artif Intell Res 18:1–44
Tien NBH, Nguyen TNTT (2022) Machine reading comprehension model for low-resource languages and experimenting on vietnamese. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, pp 370–381. Springer
Van Nguyen K, Vinh TK, Luu ST, Gia-Tuan NA, Luu-Thuy NN (2020) Enhancing lexical-based approach with external knowledge for vietnamese multiple-choice machine reading comprehension. IEEE Access 8:201404-201417201417201417
Van H, Yadav V, Surdeanu M (2021) Cheap and good? simple and effective data augmentation for low resource machine reading. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp 2116–2120
Voorhees EM et al (1999) The trec-8 question answering track report. Trec 99:77–82
Wang Y, Liu K, Liu J, He W, Lyu Y, Wu H, Li S, Wang H (2018) Multi-passage machine reading comprehension with cross-passage answer verification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1918–1927
Wang B, Liu K, Zhao J (2017) Conditional generative adversarial networks for commonsense machine comprehension. In: IJCAI, pp 4123–4129
Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2019) GLUE: A multi-task benchmark and analysis platform for natural language understanding. In: International Conference on Learning Representations
Weisberg Renee (1988) 1980s: a change in focus of reading comprehension research: a review of reading/learning disabilities research based on an interactive model of reading. Learn Disabil Q 11(2):149–159
Weston J, Bordes A, Chopra SR, Alexander M, Van Merriënboer B, Joulin A, Mikolov T (2016) Towards ai-complete question answering: a set of prerequisite toy tasks. In: 4th International Conference on Learning Representations, ICLR 2016
Wu G, Xu B, Qin Y, Wang W, Wang G (2021) Improving low resource reading comprehension via cross lingual transposition rethinking. In: The 10th International Joint Conference on Knowledge Graphs, pp 89–98
Xue Y (2022) Machine reading comprehension model based on multi-head attention mechanism. In: Advanced Intelligent Technologies for Industry, pp 45–58. Springer
Yajing Xu, Liu Weijie, Chen Guang, Ren Boya, Zhang Siman, Gao Sheng, Guo Jun (2019) Enhancing machine reading comprehension with position information. IEEE Access 7:141602–141611
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: Generalized autoregressive pretraining for language understanding. Adv Neural Inform Process Syst 8:32
Yang Yunyeong, Kang Sangwoo, Seo Jungyun (2020) Improved machine reading comprehension using data validation for weakly labeled data. IEEE Access 8:5667–5677
Yang Y, Yih W, Meek C (2015) Wikiqa: a challenge dataset for open-domain question answering. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2013–2018
Yu AW, Dohan D, Luong MT, Zhao R, Chen K, Norouzi M, Le QV (2018) Qanet: Combining local convolution with global self-attention for reading comprehension. arXiv preprint arXiv:1804.09541
Zelle JM, Mooney RJ (1996) Learning to parse database queries using inductive logic programming. In: Proceedings of the national conference on artificial intelligence, pp 1050–1055
Zeng C, Li S, Li Q, Jie H, Jianjun H (2020) A survey on machine reading comprehension-tasks, evaluation metrics and benchmark datasets. Appl Sci 10(21):7640
Zhang Z, Yuwei W, Zhou J, Duan S, Zhao H, Wang R (2020) Sg-net: syntax-guided machine reading comprehension. Proc AAAI Conf Artif Intell 34:9636–9643
Zhang Z, Zhao H, Wang R (2020) Machine reading comprehension: the role of contextualized language models and beyond. Comput Linguist 1:5
Zhang Z, Yang J, Zhao H (2021) Retrospective reader for machine reading comprehension. Proc AAAI Conf Artif Intell 35:14506–14514
Zhang X, Yang A, Li S, Wang Y (2019)Machine reading comprehension: a literature review. arXiv preprint arXiv:1907.01686
Zhang C, Zhang X, Wang H (2018) A machine reading comprehension-based approach for featured snippet extraction. In 2018 IEEE International Conference on Data Mining (ICDM), pp 1416–1421. IEEE
Zhao X, Cheng Y, Xiang W, Wang X, Han L, Shang J, Peng S (2021) A knowledge-aware machine reading comprehension framework for dialogue symptom diagnosis. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 1185–1190. IEEE
Zhou X (2021) A study of machine reading comprehension based on attention mechanism. In: 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP), pp 1058–1061. IEEE
Zhu C (2021) Machine reading comprehension: algorithms and practice. Elsevier, Amsterdam
Zhu F, Lei W, Wang C, Zheng J, Poria S, Chua T-S (2021) Retrieving and reading: a comprehensive survey on open-domain question answering. arXiv preprint arXiv:2101.00774
Author information
Authors and Affiliations
Contributions
SK has authored the majority of the article under the supervision of both SKhoja and AD, her advisors. SK originated the idea, took advices from SKhoja and AD in designing and enhancing the approach, and co-wrote the article’s initial draught. SK has contributed to the enhancement of multiple article’s sections, including review process, datasets, performance evaluation, challenges, and future directions. AD has additionally advised on incorporating comparative tables,Skhoja later enhanced the article’s overall formal writing. SK, SKhoja, AD have participated in the critical revision of the article and have accepted its final form.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix A
Appendix B
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kazi, S., Khoja, S. & Daud, A. A survey of deep learning techniques for machine reading comprehension. Artif Intell Rev 56 (Suppl 2), 2509–2569 (2023). https://doi.org/10.1007/s10462-023-10583-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-023-10583-4