ABSTRACT
Multilingual Question Answering (MQA) generates accurate answers to the user’s query despite the context language. MQA has gained popularity as individuals increasingly pose questions in both English and their native languages on social media. However, conventional Question Answering (QA) systems encounter difficulties in handling multiple languages. The development of MQA models is hindered by the lack of large-scale benchmark datasets, impeding the achievement of high performance compared to monolingual systems. An effective MQA system called OptBertDCNN is proposed to address this, leveraging multilingual cased Bidirectional Encoder Representations from Transformers (mBERT) and Optimized Deep Convolutional Neural Networks (CNN). OptBertDCNN employs a BERT tokenizer to segment sentences into tokens, enabling the extraction of features such as word embeddings from the pre-trained model, TF-IDF scores, SentiWordNet scores, and statistical features from both context and question tokens. These informative features are then inputted into OptBertDCNN, which is trained using the EHMQuAD dataset. Notably, OptBertDCNN achieves outstanding performance metrics, including an exact match of 0.755, precision of 0.765, recall of 0.773, and F1-score of 0.769. These results unequivocally demonstrate the effectiveness of OptBertDCNN in addressing the challenges of multilingual question answering.
- Mikel Artetxe, Sebastian Ruder, and Dani Yogatama. 2020. On the Cross-lingual Transferability of Monolingual Representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4623–4637.Google ScholarCross Ref
- VH Arul, VG Sivakumar, Ramalatha Marimuthu, and Basabi Chakraborty. 2019. An approach for speech enhancement using deep convolutional neural network. Multimedia Research (MR) 2, 1 (2019), 37–44.Google Scholar
- Mahesh B. Shelke, Daivat D Sawant, Chatrabhuj B Kadam, Kailas Ambhure, and Sachin N Deshmukh. 2023. Marathi SentiWordNet: A lexical resource for sentiment analysis of Marathi. Concurrency and Computation: Practice and Experience (2023), e7497.Google Scholar
- Stefano Baccianella, Andrea Esuli, Fabrizio Sebastiani, 2010. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In Lrec, Vol. 10. 2200–2204.Google Scholar
- Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015.Google Scholar
- Emily M Bender. 2009. Linguistically Naïve != Language Independent: Why NLP Needs Linguistic Typology. In Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous?26–32.Google ScholarCross Ref
- Casimiro Pio Carrino, Marta R Costa-jussà, and José AR Fonollosa. 2020. Automatic Spanish Translation of SQuAD Dataset for Multi-lingual Question Answering. In Proceedings of the 12th Language Resources and Evaluation Conference. 5515–5523.Google Scholar
- Muhao Chen and Carlo Zaniolo. 2017. Learning Multi-faceted Knowledge Graph Embeddings for Natural Language Processing. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 5169–5170.Google ScholarCross Ref
- Aarti Chugh, Vivek Kumar Sharma, Sandeep Kumar, Anand Nayyar, Basit Qureshi, Manjot Kaur Bhatia, and Charu Jain. 2021. Spider Monkey Crow Optimization Algorithm With Deep Learning for Sentiment Classification and Information Retrieval. IEEE Access 9 (2021), 24249–24262.Google ScholarCross Ref
- R Cristin, V Cyril Raj, and Ramalatha Marimuthu. 2019. Face Image Forgery Detection by Weight Optimized Neural Network Model. Multimedia Research 2, 2 (2019), 19–27.Google Scholar
- Amitava Das and Sivaji Bandyopadhyay. 2010. SentiWordNet for Indian languages. In Proceedings of the eighth workshop on Asian language resouces. 56–63.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google Scholar
- Deepak Gupta, Asif Ekbal, and Pushpak Bhattacharyya. 2019. A Deep Neural Network Framework for English Hindi Question Answering. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 2 (2019), 1–22.Google ScholarDigital Library
- Deepak Gupta, Surabhi Kumari, Asif Ekbal, and Pushpak Bhattacharyya. 2018. MMQA: A multi-domain multi-lingual question-answering framework for English and Hindi. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).Google Scholar
- Momchil Hardalov, Todor Mihaylov, Dimitrina Zlatkova, Yoan Dinkov, Ivan Koychev, and Preslav Nakov. 2020. EXAMS: A Multi-subject High School Examinations Dataset for Cross-lingual and Multilingual Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 5427–5444.Google ScholarCross Ref
- Zhiheng Huang, Marcus Thint, and Zengchang Qin. 2008. Question Classification using Head Words and their Hypernyms. In Proceedings of the 2008 Conference on empirical methods in natural language processing. 927–936.Google ScholarCross Ref
- Ammar Ismael Kadhim. 2019. Term Weighting for Feature Extraction on Twitter: A Comparison Between BM25 and TF-IDF. In 2019 international conference on advanced science and engineering (ICOASE). IEEE, 124–128.Google ScholarCross Ref
- Farhan Hassan Khan, Usman Qamar, and Saba Bashir. 2016. SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Applied Soft Computing 39 (2016), 140–153.Google ScholarDigital Library
- Patrick Lewis, Barlas Oguz, Ruty Rinott, Sebastian Riedel, and Holger Schwenk. 2020. MLQA: Evaluating Cross-lingual Extractive Question Answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7315–7330.Google ScholarCross Ref
- Ekaterina Loginova, Stalin Varanasi, and Günter Neumann. 2021. Towards End-to-End Multilingual Question Answering. Information Systems Frontiers 23, 1 (2021), 227–241.Google ScholarDigital Library
- Benjamin Muller, Luca Soldaini, Rik Koncel-Kedziorski, Eric Lind, and Alessandro Moschitti. 2021. Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering Approach for Open-Domain Question Answering. arXiv preprint arXiv:2110.07150 (2021).Google Scholar
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google ScholarCross Ref
- Machhirke Vinodkumar Sadhuram and Aparna Soni. 2020. Natural Language Processing based New Approach to Design Factoid Question Answering System. In 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA). IEEE, 276–281.Google Scholar
- Tasmiah Tahsin Mayeesha, Abdullah Md Sarwar, and Rashedur M Rahman. 2021. Deep learning based question answering system in Bengali. Journal of Information and Telecommunication 5, 2 (2021), 145–178.Google ScholarCross Ref
- S Thara, E Sampath, Phanindra Reddy, 2020. Code Mixed Question Answering Challenge using Deep Learning Methods. In 2020 5th International Conference on Communication and Electronics Systems (ICCES). IEEE, 1331–1337.Google Scholar
- Rui Yan, Weiheng Liao, Jianwei Cui, Hailei Zhang, Yichuan Hu, and Dongyan Zhao. 2021. Multilingual COVID-QA: Learning towards global information sharing via web question answering in multiple languages. In Proceedings of the Web Conference 2021. 2590–2600.Google ScholarDigital Library
- Seyhmus Yilmaz and Sinan Toklu. 2020. A deep learning analysis on question classification task using Word2vec representations. Neural Computing and Applications 32, 7 (2020), 2909–2928.Google ScholarCross Ref
Index Terms
- OptBertDCNN: A framework based on BERT and optimized Deep Convolutional Neural Network for MQA
Recommendations
Hybrid Deep Neural Networks for Industrial Text Scoring
Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial IntelligenceAbstractAcademic scoring is mainly explored through the pedagogical fields of Automated Essay Scoring (AES) and Short Answer Scoring (SAS), but text scoring in other domains has received limited attention. This paper focuses on industrial text scoring, ...
Toward an Effective Igbo Part-of-Speech Tagger
Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments ...
Edge-preserving image denoising using a deep convolutional neural network
Highlights- This paper makes use of a deep CNN for image denoising.
- The network is trained ...
AbstractThis paper introduces a novel denoising approach making use of a deep convolutional neural network to preserve image edges. The network is trained by using the edge map obtained from the well-known Canny algorithm and aims at ...
Comments