Skip to main content
Log in

Ensemble learning with linguistic, summary language and psychological features for location prediction

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Location prediction is the preliminary step for location extraction, however, due to spelling/grammatical errors and below-standard language quality, location prediction from tweet text is not an easy task. This study addresses the issue of predicting whether a tweet contains a location or not and proposed a framework that uses summary language, psychological and linguistic indicators with a voting-based ensemble model. Two algorithms are designed to simulate the results easily and to reproduce the results for further research. The framework is evaluated on two benchmark Twitter datasets. The findings indicate that the linguistic feature demonstrated robust performance as a standalone model. In addition, the best performance is achieved by the hybrid combination of proposed features and it also outperformed the state-of-the-art baselines. The proposed framework achieved 93.41% and 87.73% accuracy on Ritter and MSM2013 datasets. The results showed that the most influential features are based on prepositions, words related to cognitive processes, pronouns, and personal pronouns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Akuma S, Lubem T, Adom IT (2022) Comparing bag of words and TF-IDF with different models for hate speech detection from live tweets. Int J Inform Technol 1–7

  2. Vashisht G, Sinha YN (2021) Sentimental study of CAA by location-based tweets. Int J Inform Technol 13:1555–1567

    Google Scholar 

  3. Chi L, Lim KH, Alam N, Butler CJ (2016) Geolocation prediction in Twitter using location indicative words and textual features. In: Proceedings of the 2nd workshop on noisy user-generated text (WNUT). pp 227–234

  4. Liu R, Cong G, Zheng B, Zheng K, Su H (2018) Location prediction in social networks. In: Asia-Pacific Web (APWeb) and web-age information management (WAIM) joint international conference on web and big data. Springer, pp 151–165

  5. Singh JP, Dwivedi YK, Rana NP, Kumar A, Kapoor KK (2017) Event classification and location prediction from tweets during disasters. Ann Oper Res 283:737–757

    Article  Google Scholar 

  6. Middleton SE, Kordopatis-Zilos G, Papadopoulos S, Kompatsiaris Y (2018) Location extraction from social media: geoparsing, location disambiguation, and geotagging. ACM Trans Inform Syst (TOIS) 36(4):40

    Google Scholar 

  7. Kumar A, Singh JP (2019) Location reference identification from tweets during emergencies: a deep learning approach. Int J Disaster risk Reduct 33:365–375

    Article  Google Scholar 

  8. Paraskevopoulos P, Palpanas T (2016) Where has this tweet come from? Fast and fine-grained geolocalization of non-geotagged tweets. Social Netw Anal Min 6(1):89

    Article  Google Scholar 

  9. Li P, Lu H, Kanhabua N, Zhao S, Pan G (2018) Location inference for non-geotagged tweets in user timelines. IEEE Trans Knowl Data Eng 31:1150–1165

    Article  Google Scholar 

  10. Ozdikis O, Ramampiaro H, Nørvåg K (2019) Locality-adapted kernel densities of term co-occurrences for location prediction of tweets. Inf Process Manag 56(4):1280–1299

    Article  Google Scholar 

  11. Hoang TBN, Mothe J (2018) Location extraction from tweets. Inf Process Manag 54(2):129–144

    Article  Google Scholar 

  12. Paule JDG, Sun Y, Moshfeghi Y (2019) On fine-grained geolocalisation of tweets and real-time traffic incident detection. Inf Process Manag 56(3):1119–1132

    Article  Google Scholar 

  13. Al-Olimat HS, Thirunarayan K, Shalin V, Sheth A (2017) Location name extraction from targeted text streams using gazetteer-based statistical language models. arXiv preprint arXiv:1708.03105

  14. Ghahremanlou L, Sherchan W, Thom JA (2015) Geotagging twitter messages in crisis management. Comput J 58(9):1937–1954

    Article  Google Scholar 

  15. Chauhan A, Kummamuru K, Toshniwal D (2017) Prediction of places of visit using tweets. Knowl Inf Syst 50(1):145–166

    Article  Google Scholar 

  16. Inkpen D, Liu J, Farzindar A, Kazemi F, Ghazi D (2017) Location detection and disambiguation from Twitter messages. J Intell Inform Syst 49(2):237–253

    Article  Google Scholar 

  17. Zubiaga A, Voss A, Procter R, Liakata M, Wang B, Tsakalidis A (2017) Towards real-time, country-level location classification of worldwide tweets. IEEE Trans Knowl Data Eng 1:1–1

    Google Scholar 

  18. Hoang TBN, Moriceau V, Mothe J (2018) Can we predict locations in tweets? A machine learning approach. Int J Comput Linguist Appl 9:0

    Google Scholar 

  19. Mousset P, Pitarch Y, Tamine L (2020) End-to-end neural matching for semantic location prediction of tweets. ACM Trans Inform Syst (TOIS) 39(1):1–35

    Google Scholar 

  20. Simanjuntak LF, Mahendra R, Yulianti E (2022) We know you are living in Bali: location prediction of Twitter users using BERT language model. Big Data Cogn Comput 6(3):77

    Article  Google Scholar 

  21. Surti C, Rane P, Jadhav V (2022) Location prediction on Twitter using hybrid model. In: Soft computing for security applications: Proceedings of ICSCS 2021. Springer, pp 915–928

  22. Alsaqer M, Alelyani S, Mohana M, Alreemy K, Alqahtani A (2023) Predicting location of Tweets using machine learning approaches. Appl Sci 13(5):3025

    Article  Google Scholar 

  23. Candelieri A, Archetti F, Giordani I, Arosio G, Sormani R (2013) Smart cities management by integrating sensors, models and user generated contents. WIT Trans Ecol Environ 179:719–730

    Article  Google Scholar 

  24. Ying Y, Peng C, Dong C, Li Y, Feng Y (2018) Inferring event geolocation based on Twitter. In: Proceedings of the 10th international conference on internet multimedia computing and service. ACM, 26:1–5

  25. Laylavi F, Rajabifard A, Kalantari M (2016) A multi-element approach to location inference of twitter: a case for emergency response. ISPRS Int J Geo Inf 5(5):56

    Article  Google Scholar 

  26. Loynes C, Ouenniche J, De Smedt J (2020) The detection and location estimation of disasters using Twitter and the identification of non-governmental organisations using crowdsourcing. Ann Oper Res 308:339–371

    Article  MathSciNet  Google Scholar 

  27. Eligüzel N, Çetinkaya C, Dereli T (2020) Comparison of different machine learning techniques on location extraction by utilizing geo-tagged tweets: a case study. Adv Eng Inform 46:101151

    Article  Google Scholar 

  28. Chen Z, Pokharel B, Li B, Lim S (2021) Location extraction from Twitter messages using a bidirectional long short-term memory neural network with conditional random field model. In: International conference on geographical information systems theory, applications and management. Springer, pp 18–30

  29. dela Cruz JA, Hendrickx I, Larson M (2022) Understanding fine-tuned BERT models for flood location extraction on Twitter data

  30. Suwaileh R, Elsayed T, Imran M, Sajjad H (2022) When a disaster happens, we are ready: location mention recognition from crisis tweets. Int J Disaster Risk Reduct 78:103107

    Article  Google Scholar 

  31. Kumar A, Singh JP (2022) Deep neural networks for location reference identification from bilingual disaster-related tweets. In: IEEE Trans Comput Social Syst.

  32. Lamsal R, Harwood A, Read MR (2022) Where did you tweet from? Inferring the origin locations of tweets based on contextual information. arXiv preprint arXiv:2211.16506

  33. Suleman M et al (2023) Floods relevancy and identification of location from Twitter posts using NLP techniques. arXiv preprint arXiv:2301.00321

  34. Chung CK, Pennebaker JW (2012) Linguistic inquiry and word count (LIWC): pronounced “Luke,”... and other useful facts. In: Applied natural language processing: identification, investigation and resolution: IGI Global. pp 206–229

  35. Mostafa A, Gad W, Abdelkader T, Badr N (2020) Predicting the tweet location based on KNN-sentimental analysis. In: 15th International conference on computer engineering and systems (ICCES), 2020. IEEE, pp 1–6

  36. Cano Basave AE, Varga A, Rowe M, Stankovic M, Dadzie A-S (2013) Making sense of microposts (# msm2013) concept extraction challenge.

  37. Ritter A, Clark S, Etzioni O (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing. Association for computational linguistics. pp 1524–1534

  38. Verma S, Sahu SP, Sahu TP (2023) Discrete wavelet transform-based feature engineering for stock market prediction. Int J Inform Technol 15:1179–1188

    Google Scholar 

  39. Dollen DV, Neukart F, Weimer D, Bäck T (2023) Predicting vehicle prices via quantum-assisted feature selection. Int J Inform Technol 15:2897–2905

  40. Zubadi NFM, Dollah R, Zain M (2016) Employing information gain as feature selection method for classification of biomedical text abstracts. UTM computing proceedings, p 1

  41. Sharma A, Mishra PK (2021) Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis. Int J Inform Technol 14:1949–1960

Download references

Acknowledgements

This article is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University). Moreover, this research was supported in part by computational resources of HPC facilities at HSE University.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Shahid Iqbal Malik.

Ethics declarations

Conflict of interest

The authors have no competing interest to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malik, M.S.I., Rehman, F. & Ignatov, D.I. Ensemble learning with linguistic, summary language and psychological features for location prediction. Int. j. inf. tecnol. 16, 193–205 (2024). https://doi.org/10.1007/s41870-023-01560-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-023-01560-9

Keywords

Navigation