Ensemble learning with linguistic, summary language and psychological features for location prediction

Malik, Muhammad Shahid Iqbal; Rehman, Faisal; Ignatov, Dmitry I.

doi:10.1007/s41870-023-01560-9

Ensemble learning with linguistic, summary language and psychological features for location prediction

Original Research
Published: 17 October 2023

Volume 16, pages 193–205, (2024)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

Muhammad Shahid Iqbal Malik ORCID: orcid.org/0000-0001-8396-3344¹,
Faisal Rehman² &
Dmitry I. Ignatov¹

105 Accesses
2 Citations
Explore all metrics

Abstract

Location prediction is the preliminary step for location extraction, however, due to spelling/grammatical errors and below-standard language quality, location prediction from tweet text is not an easy task. This study addresses the issue of predicting whether a tweet contains a location or not and proposed a framework that uses summary language, psychological and linguistic indicators with a voting-based ensemble model. Two algorithms are designed to simulate the results easily and to reproduce the results for further research. The framework is evaluated on two benchmark Twitter datasets. The findings indicate that the linguistic feature demonstrated robust performance as a standalone model. In addition, the best performance is achieved by the hybrid combination of proposed features and it also outperformed the state-of-the-art baselines. The proposed framework achieved 93.41% and 87.73% accuracy on Ritter and MSM2013 datasets. The results showed that the most influential features are based on prepositions, words related to cognitive processes, pronouns, and personal pronouns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Review on Random Forest: An Ensemble Classifier

Automated identification of media bias in news articles: an interdisciplinary literature review

Article Open access 16 November 2018

References

Akuma S, Lubem T, Adom IT (2022) Comparing bag of words and TF-IDF with different models for hate speech detection from live tweets. Int J Inform Technol 1–7
Vashisht G, Sinha YN (2021) Sentimental study of CAA by location-based tweets. Int J Inform Technol 13:1555–1567
Google Scholar
Chi L, Lim KH, Alam N, Butler CJ (2016) Geolocation prediction in Twitter using location indicative words and textual features. In: Proceedings of the 2nd workshop on noisy user-generated text (WNUT). pp 227–234
Liu R, Cong G, Zheng B, Zheng K, Su H (2018) Location prediction in social networks. In: Asia-Pacific Web (APWeb) and web-age information management (WAIM) joint international conference on web and big data. Springer, pp 151–165
Singh JP, Dwivedi YK, Rana NP, Kumar A, Kapoor KK (2017) Event classification and location prediction from tweets during disasters. Ann Oper Res 283:737–757
Article Google Scholar
Middleton SE, Kordopatis-Zilos G, Papadopoulos S, Kompatsiaris Y (2018) Location extraction from social media: geoparsing, location disambiguation, and geotagging. ACM Trans Inform Syst (TOIS) 36(4):40
Google Scholar
Kumar A, Singh JP (2019) Location reference identification from tweets during emergencies: a deep learning approach. Int J Disaster risk Reduct 33:365–375
Article Google Scholar
Paraskevopoulos P, Palpanas T (2016) Where has this tweet come from? Fast and fine-grained geolocalization of non-geotagged tweets. Social Netw Anal Min 6(1):89
Article Google Scholar
Li P, Lu H, Kanhabua N, Zhao S, Pan G (2018) Location inference for non-geotagged tweets in user timelines. IEEE Trans Knowl Data Eng 31:1150–1165
Article Google Scholar
Ozdikis O, Ramampiaro H, Nørvåg K (2019) Locality-adapted kernel densities of term co-occurrences for location prediction of tweets. Inf Process Manag 56(4):1280–1299
Article Google Scholar
Hoang TBN, Mothe J (2018) Location extraction from tweets. Inf Process Manag 54(2):129–144
Article Google Scholar
Paule JDG, Sun Y, Moshfeghi Y (2019) On fine-grained geolocalisation of tweets and real-time traffic incident detection. Inf Process Manag 56(3):1119–1132
Article Google Scholar
Al-Olimat HS, Thirunarayan K, Shalin V, Sheth A (2017) Location name extraction from targeted text streams using gazetteer-based statistical language models. arXiv preprint arXiv:1708.03105
Ghahremanlou L, Sherchan W, Thom JA (2015) Geotagging twitter messages in crisis management. Comput J 58(9):1937–1954
Article Google Scholar
Chauhan A, Kummamuru K, Toshniwal D (2017) Prediction of places of visit using tweets. Knowl Inf Syst 50(1):145–166
Article Google Scholar
Inkpen D, Liu J, Farzindar A, Kazemi F, Ghazi D (2017) Location detection and disambiguation from Twitter messages. J Intell Inform Syst 49(2):237–253
Article Google Scholar
Zubiaga A, Voss A, Procter R, Liakata M, Wang B, Tsakalidis A (2017) Towards real-time, country-level location classification of worldwide tweets. IEEE Trans Knowl Data Eng 1:1–1
Google Scholar
Hoang TBN, Moriceau V, Mothe J (2018) Can we predict locations in tweets? A machine learning approach. Int J Comput Linguist Appl 9:0
Google Scholar
Mousset P, Pitarch Y, Tamine L (2020) End-to-end neural matching for semantic location prediction of tweets. ACM Trans Inform Syst (TOIS) 39(1):1–35
Google Scholar
Simanjuntak LF, Mahendra R, Yulianti E (2022) We know you are living in Bali: location prediction of Twitter users using BERT language model. Big Data Cogn Comput 6(3):77
Article Google Scholar
Surti C, Rane P, Jadhav V (2022) Location prediction on Twitter using hybrid model. In: Soft computing for security applications: Proceedings of ICSCS 2021. Springer, pp 915–928
Alsaqer M, Alelyani S, Mohana M, Alreemy K, Alqahtani A (2023) Predicting location of Tweets using machine learning approaches. Appl Sci 13(5):3025
Article Google Scholar
Candelieri A, Archetti F, Giordani I, Arosio G, Sormani R (2013) Smart cities management by integrating sensors, models and user generated contents. WIT Trans Ecol Environ 179:719–730
Article Google Scholar
Ying Y, Peng C, Dong C, Li Y, Feng Y (2018) Inferring event geolocation based on Twitter. In: Proceedings of the 10th international conference on internet multimedia computing and service. ACM, 26:1–5
Laylavi F, Rajabifard A, Kalantari M (2016) A multi-element approach to location inference of twitter: a case for emergency response. ISPRS Int J Geo Inf 5(5):56
Article Google Scholar
Loynes C, Ouenniche J, De Smedt J (2020) The detection and location estimation of disasters using Twitter and the identification of non-governmental organisations using crowdsourcing. Ann Oper Res 308:339–371
Article MathSciNet Google Scholar
Eligüzel N, Çetinkaya C, Dereli T (2020) Comparison of different machine learning techniques on location extraction by utilizing geo-tagged tweets: a case study. Adv Eng Inform 46:101151
Article Google Scholar
Chen Z, Pokharel B, Li B, Lim S (2021) Location extraction from Twitter messages using a bidirectional long short-term memory neural network with conditional random field model. In: International conference on geographical information systems theory, applications and management. Springer, pp 18–30
dela Cruz JA, Hendrickx I, Larson M (2022) Understanding fine-tuned BERT models for flood location extraction on Twitter data
Suwaileh R, Elsayed T, Imran M, Sajjad H (2022) When a disaster happens, we are ready: location mention recognition from crisis tweets. Int J Disaster Risk Reduct 78:103107
Article Google Scholar
Kumar A, Singh JP (2022) Deep neural networks for location reference identification from bilingual disaster-related tweets. In: IEEE Trans Comput Social Syst.
Lamsal R, Harwood A, Read MR (2022) Where did you tweet from? Inferring the origin locations of tweets based on contextual information. arXiv preprint arXiv:2211.16506
Suleman M et al (2023) Floods relevancy and identification of location from Twitter posts using NLP techniques. arXiv preprint arXiv:2301.00321
Chung CK, Pennebaker JW (2012) Linguistic inquiry and word count (LIWC): pronounced “Luke,”... and other useful facts. In: Applied natural language processing: identification, investigation and resolution: IGI Global. pp 206–229
Mostafa A, Gad W, Abdelkader T, Badr N (2020) Predicting the tweet location based on KNN-sentimental analysis. In: 15th International conference on computer engineering and systems (ICCES), 2020. IEEE, pp 1–6
Cano Basave AE, Varga A, Rowe M, Stankovic M, Dadzie A-S (2013) Making sense of microposts (# msm2013) concept extraction challenge.
Ritter A, Clark S, Etzioni O (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing. Association for computational linguistics. pp 1524–1534
Verma S, Sahu SP, Sahu TP (2023) Discrete wavelet transform-based feature engineering for stock market prediction. Int J Inform Technol 15:1179–1188
Google Scholar
Dollen DV, Neukart F, Weimer D, Bäck T (2023) Predicting vehicle prices via quantum-assisted feature selection. Int J Inform Technol 15:2897–2905
Zubadi NFM, Dollah R, Zain M (2016) Employing information gain as feature selection method for classification of biomedical text abstracts. UTM computing proceedings, p 1
Sharma A, Mishra PK (2021) Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis. Int J Inform Technol 14:1949–1960

Download references

Acknowledgements

This article is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University). Moreover, this research was supported in part by computational resources of HPC facilities at HSE University.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Department of Computer Science, National Research University Higher School of Economics, 11 Pokrovskiy Boulevard, Moscow, Russian Federation, 109028
Muhammad Shahid Iqbal Malik & Dmitry I. Ignatov
Department of Computer Science, Comsats University, Attock Campus, Kamra Road, Attock, Pakistan
Faisal Rehman

Authors

Muhammad Shahid Iqbal Malik
View author publications
You can also search for this author in PubMed Google Scholar
Faisal Rehman
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry I. Ignatov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Shahid Iqbal Malik.

Ethics declarations

Conflict of interest

The authors have no competing interest to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Malik, M.S.I., Rehman, F. & Ignatov, D.I. Ensemble learning with linguistic, summary language and psychological features for location prediction. Int. j. inf. tecnol. 16, 193–205 (2024). https://doi.org/10.1007/s41870-023-01560-9

Download citation

Received: 07 February 2023
Accepted: 25 August 2023
Published: 17 October 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s41870-023-01560-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble learning with linguistic, summary language and psychological features for location prediction

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Review on Random Forest: An Ensemble Classifier

Automated identification of media bias in news articles: an interdisciplinary literature review

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ensemble learning with linguistic, summary language and psychological features for location prediction

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A Review on Random Forest: An Ensemble Classifier

Automated identification of media bias in news articles: an interdisciplinary literature review

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation