Abstract
Location prediction is the preliminary step for location extraction, however, due to spelling/grammatical errors and below-standard language quality, location prediction from tweet text is not an easy task. This study addresses the issue of predicting whether a tweet contains a location or not and proposed a framework that uses summary language, psychological and linguistic indicators with a voting-based ensemble model. Two algorithms are designed to simulate the results easily and to reproduce the results for further research. The framework is evaluated on two benchmark Twitter datasets. The findings indicate that the linguistic feature demonstrated robust performance as a standalone model. In addition, the best performance is achieved by the hybrid combination of proposed features and it also outperformed the state-of-the-art baselines. The proposed framework achieved 93.41% and 87.73% accuracy on Ritter and MSM2013 datasets. The results showed that the most influential features are based on prepositions, words related to cognitive processes, pronouns, and personal pronouns.
Similar content being viewed by others
References
Akuma S, Lubem T, Adom IT (2022) Comparing bag of words and TF-IDF with different models for hate speech detection from live tweets. Int J Inform Technol 1–7
Vashisht G, Sinha YN (2021) Sentimental study of CAA by location-based tweets. Int J Inform Technol 13:1555–1567
Chi L, Lim KH, Alam N, Butler CJ (2016) Geolocation prediction in Twitter using location indicative words and textual features. In: Proceedings of the 2nd workshop on noisy user-generated text (WNUT). pp 227–234
Liu R, Cong G, Zheng B, Zheng K, Su H (2018) Location prediction in social networks. In: Asia-Pacific Web (APWeb) and web-age information management (WAIM) joint international conference on web and big data. Springer, pp 151–165
Singh JP, Dwivedi YK, Rana NP, Kumar A, Kapoor KK (2017) Event classification and location prediction from tweets during disasters. Ann Oper Res 283:737–757
Middleton SE, Kordopatis-Zilos G, Papadopoulos S, Kompatsiaris Y (2018) Location extraction from social media: geoparsing, location disambiguation, and geotagging. ACM Trans Inform Syst (TOIS) 36(4):40
Kumar A, Singh JP (2019) Location reference identification from tweets during emergencies: a deep learning approach. Int J Disaster risk Reduct 33:365–375
Paraskevopoulos P, Palpanas T (2016) Where has this tweet come from? Fast and fine-grained geolocalization of non-geotagged tweets. Social Netw Anal Min 6(1):89
Li P, Lu H, Kanhabua N, Zhao S, Pan G (2018) Location inference for non-geotagged tweets in user timelines. IEEE Trans Knowl Data Eng 31:1150–1165
Ozdikis O, Ramampiaro H, Nørvåg K (2019) Locality-adapted kernel densities of term co-occurrences for location prediction of tweets. Inf Process Manag 56(4):1280–1299
Hoang TBN, Mothe J (2018) Location extraction from tweets. Inf Process Manag 54(2):129–144
Paule JDG, Sun Y, Moshfeghi Y (2019) On fine-grained geolocalisation of tweets and real-time traffic incident detection. Inf Process Manag 56(3):1119–1132
Al-Olimat HS, Thirunarayan K, Shalin V, Sheth A (2017) Location name extraction from targeted text streams using gazetteer-based statistical language models. arXiv preprint arXiv:1708.03105
Ghahremanlou L, Sherchan W, Thom JA (2015) Geotagging twitter messages in crisis management. Comput J 58(9):1937–1954
Chauhan A, Kummamuru K, Toshniwal D (2017) Prediction of places of visit using tweets. Knowl Inf Syst 50(1):145–166
Inkpen D, Liu J, Farzindar A, Kazemi F, Ghazi D (2017) Location detection and disambiguation from Twitter messages. J Intell Inform Syst 49(2):237–253
Zubiaga A, Voss A, Procter R, Liakata M, Wang B, Tsakalidis A (2017) Towards real-time, country-level location classification of worldwide tweets. IEEE Trans Knowl Data Eng 1:1–1
Hoang TBN, Moriceau V, Mothe J (2018) Can we predict locations in tweets? A machine learning approach. Int J Comput Linguist Appl 9:0
Mousset P, Pitarch Y, Tamine L (2020) End-to-end neural matching for semantic location prediction of tweets. ACM Trans Inform Syst (TOIS) 39(1):1–35
Simanjuntak LF, Mahendra R, Yulianti E (2022) We know you are living in Bali: location prediction of Twitter users using BERT language model. Big Data Cogn Comput 6(3):77
Surti C, Rane P, Jadhav V (2022) Location prediction on Twitter using hybrid model. In: Soft computing for security applications: Proceedings of ICSCS 2021. Springer, pp 915–928
Alsaqer M, Alelyani S, Mohana M, Alreemy K, Alqahtani A (2023) Predicting location of Tweets using machine learning approaches. Appl Sci 13(5):3025
Candelieri A, Archetti F, Giordani I, Arosio G, Sormani R (2013) Smart cities management by integrating sensors, models and user generated contents. WIT Trans Ecol Environ 179:719–730
Ying Y, Peng C, Dong C, Li Y, Feng Y (2018) Inferring event geolocation based on Twitter. In: Proceedings of the 10th international conference on internet multimedia computing and service. ACM, 26:1–5
Laylavi F, Rajabifard A, Kalantari M (2016) A multi-element approach to location inference of twitter: a case for emergency response. ISPRS Int J Geo Inf 5(5):56
Loynes C, Ouenniche J, De Smedt J (2020) The detection and location estimation of disasters using Twitter and the identification of non-governmental organisations using crowdsourcing. Ann Oper Res 308:339–371
Eligüzel N, Çetinkaya C, Dereli T (2020) Comparison of different machine learning techniques on location extraction by utilizing geo-tagged tweets: a case study. Adv Eng Inform 46:101151
Chen Z, Pokharel B, Li B, Lim S (2021) Location extraction from Twitter messages using a bidirectional long short-term memory neural network with conditional random field model. In: International conference on geographical information systems theory, applications and management. Springer, pp 18–30
dela Cruz JA, Hendrickx I, Larson M (2022) Understanding fine-tuned BERT models for flood location extraction on Twitter data
Suwaileh R, Elsayed T, Imran M, Sajjad H (2022) When a disaster happens, we are ready: location mention recognition from crisis tweets. Int J Disaster Risk Reduct 78:103107
Kumar A, Singh JP (2022) Deep neural networks for location reference identification from bilingual disaster-related tweets. In: IEEE Trans Comput Social Syst.
Lamsal R, Harwood A, Read MR (2022) Where did you tweet from? Inferring the origin locations of tweets based on contextual information. arXiv preprint arXiv:2211.16506
Suleman M et al (2023) Floods relevancy and identification of location from Twitter posts using NLP techniques. arXiv preprint arXiv:2301.00321
Chung CK, Pennebaker JW (2012) Linguistic inquiry and word count (LIWC): pronounced “Luke,”... and other useful facts. In: Applied natural language processing: identification, investigation and resolution: IGI Global. pp 206–229
Mostafa A, Gad W, Abdelkader T, Badr N (2020) Predicting the tweet location based on KNN-sentimental analysis. In: 15th International conference on computer engineering and systems (ICCES), 2020. IEEE, pp 1–6
Cano Basave AE, Varga A, Rowe M, Stankovic M, Dadzie A-S (2013) Making sense of microposts (# msm2013) concept extraction challenge.
Ritter A, Clark S, Etzioni O (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing. Association for computational linguistics. pp 1524–1534
Verma S, Sahu SP, Sahu TP (2023) Discrete wavelet transform-based feature engineering for stock market prediction. Int J Inform Technol 15:1179–1188
Dollen DV, Neukart F, Weimer D, Bäck T (2023) Predicting vehicle prices via quantum-assisted feature selection. Int J Inform Technol 15:2897–2905
Zubadi NFM, Dollah R, Zain M (2016) Employing information gain as feature selection method for classification of biomedical text abstracts. UTM computing proceedings, p 1
Sharma A, Mishra PK (2021) Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis. Int J Inform Technol 14:1949–1960
Acknowledgements
This article is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University). Moreover, this research was supported in part by computational resources of HPC facilities at HSE University.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interest to declare.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Malik, M.S.I., Rehman, F. & Ignatov, D.I. Ensemble learning with linguistic, summary language and psychological features for location prediction. Int. j. inf. tecnol. 16, 193–205 (2024). https://doi.org/10.1007/s41870-023-01560-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-023-01560-9