Optimized Clustering Model for Healthcare Sentiments on Twitter: A Big Data Analysis Approach

Pourroostaei Ardakani, Saeid; Cheshmehzangi, Ali

doi:10.1007/978-981-99-6620-2_9

Part of the book series: Urban Sustainability ((US))

80 Accesses

Abstract

Social media, such as Twitter, typically stores a large amount of user-generated content regarding different aspects of society. These contents include social events, e-commerce products, healthcare, etc. This chapter proposes a best-fitted clustering method to classify sentiment samples related to healthcare topics. Thus, we examine other clustering models with keyword extraction methods on the real healthcare datasets collected from Twitter. The experiment results indicate that self-organized map model with the TF-IDF extraction method can achieve the best clustering accuracy. Moreover, the optimized model can have great potential to handle large-scale data in real practice.

The research work in this chapter was supported by our research team members: Chenhao Wang, Erfan Wang, Zhe Hua, and Zhengrui Xue.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abualigah, L., H.E. Alfar, M. Shehab, and A.M.A. Hussein. 2020. Sentiment analysis in healthcare: A brief review. In Recent advances in NLP: The case of Arabic language. Studies in computational intelligence, vol. 874, ed. M. Abd Elaziz, M. Al-qaness, A. Ewees and A. Dahou. Cham: Springer. https://doi.org/10.1007/978-3-030-34614-0_7.
Alguliev, R.M., and R.M. Aliguliyev. 2005. Effective summarization method of text documents. In The 2005 IEEE/WIC/ACM international conference on web intelligence (WI’05), 264–271. IEEE.
Google Scholar
Andrzejewski, D., and X. Zhu. 2009. Latent Dirichlet allocation with topic-in-set knowledge. In Proceedings of the NAACL HLT 2009 workshop on semi-supervised learning for natural language processing, 43–48.
Google Scholar
Asan, U., and S. Ercan. 2012. An introduction to self-organizing maps. In Computational intelligence systems in industrial engineering, vol. 6, ed. C. Kahraman, 295–315. Paris: Atlantis Press.
Google Scholar
Bahrainian, S.A., and A. Dengel. 2013. Sentiment analysis and summarization of twitter data. In 2013 IEEE 16th international conference on computational science and engineering, 227–234. IEEE.
Google Scholar
Belle, A., R. Thiagarajan, S. Soroushmehr, F. Navidi, D.A. Beard, and K. Najarian. 2015. Big data analytics in healthcare, BioMed Research International 2015.
Google Scholar
Bracewell, D.B., F. Ren, and S. Kuriowa. 2005. Multilingual single document keyword extraction for information retrieval. In 2005 international conference on natural language processing and knowledge engineering, 517–522. IEEE.
Google Scholar
Brin, S., and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine, computer networks and ISDN systems. In Proceedings of the seventh international world wide web conference, vol. 30, no. 1, 107–117 [Online]. Available at: https://www.sciencedirect.com/science/article/pii/S016975529800110X.
Chluski, A., and L. Ziora. 2015. The application of big data in the management of healthcare organizations: A review of selected practical solutions. Informatyka Ekonomicz (01).
Google Scholar
Cohen, P.R., and C.A. Sutton. 2003. Very predictive ngrams for space-limited probabilistic models. In International symposium on intelligent data analysis, 134–142. Berlin: Springer.
Google Scholar
Guha, S., A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. 2003. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15 (3): 515–528.
Article Google Scholar
Guo, A., and T. Yang. 2016. Research and improvement of feature words weight based on tfidf algorithm. In 2016 IEEE information technology, networking, electronic and automation control conference, 415–419.
Google Scholar
Gupta, V.S., and S. Kohli. 2016. Twitter sentiment analysis in healthcare using hadoop and r. In 2016 3rd international conference on computing for sustainable global development (INDIACom), 3766–3772. IEEE.
Google Scholar
Hanks, P. 2013. Lexical analysis: Norms and exploitations. Cambridge: MIT Press.
Book Google Scholar
Hartigan, J.A. 1975. Clustering algorithms. New York: Wiley.
MATH Google Scholar
Hatzivassiloglou, V., and K. McKeown. 1997. Predicting the semantic orientation of adjectives. In 35th annual meeting of the association for computational linguistics and 8th conference of the European chapter of the Association for Computational Linguistics, 174–181.
Google Scholar
Hu, X., and B. Wu. 2006. Automatic keyword extraction using linguistic features. In Sixth IEEE international conference on data mining-workshops (ICDMW’06), 19–23. IEEE.
Google Scholar
Hu, J., S. Li, Y. Yao, L. Yu, G. Yang, and J. Hu. 2018. Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy 20 (2): 104.
Article Google Scholar
Hulth, A. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on empirical methods in natural language processing, 216–223.
Google Scholar
Kaur, J., and V. Gupta. 2010. Effective approaches for extraction of keywords. International Journal of Computer Science Issues (IJCSI) 7 (6): 144.
Google Scholar
Kohler, R. 2012. Quantitative syntax analysis, vol. 65. Walter de Gruyter.
Google Scholar
Kohonen, T. 1982. Self-organized formation of topologically correct feature maps. Biological Cybernetics 43 (1): 59–69. Available at: https://doi.org/10.1007/BF00337288.
Kohonen, T. 2001. Self-organizing maps, vol. 30. The information sciences book series. Berlin and Heidelberg: Springer [Online]. Available at: https://doi.org/10.1007/978-3-642-56927-2.
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14, Oakland, CA, USA, 281–297.
Google Scholar
Matsuo, Y., and M. Ishizuka. 2004. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools 13 (01): 157–169.
Article Google Scholar
Mihalcea, R., and P. Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, 404–411.
Google Scholar
Na, S., L. Xumin, and G. Yong. 2010. Research on k-means clustering algorithm: An improved k-means clustering algorithm. In 2010 third international symposium on intelligent information technology and security informatics, 63–67. IEEE.
Google Scholar
Navada, A., A.N. Ansari, S. Patil, and B.A. Sonkamble. 2011. Overview of use of decision tree algorithms in machine learning. In 2011 IEEE control and system graduate research colloquium, 37–42. IEEE.
Google Scholar
Ohsawa, Y., N.E. Benson, and M. Yachida. 1998. Keygraph: Automatic indexing by co-occurrence graph based on building construction metaphor. In Proceedings IEEE international forum on research and technology advances in digital libraries-ADL’98, 12–18. IEEE.
Google Scholar
Onoda, T., T. Yumoto, and K. Sumiya. 2008. Extracting and clustering related keywords based on history of query frequency. In 2008 second international symposium on universal communication, 162–166. IEEE.
Google Scholar
Ouyang, Y., W. Li, S. Li, and Q. Lu. 2011. Applying regression models to query-focused multi-document summarization. Information Processing & Management 47 (2): 227–237.
Article Google Scholar
Pat Research. 2020. Big data analytics and predictive analytics in 2021—Reviews, features, pricing, comparison [Online]. Available at: https://www.predictiveanalyticstoday.com/big-data-analytics-and-predictive-analytics/.
Raghupathi, W., and V. Raghupathi. 2014. Big data analytics in healthcare: Promise and potential. Health Information Science and Systems 2 (1): 1–10.
Article Google Scholar
Rajaraman, A., and J.D. Ullman. 2011. Data mining, 1–17. Cambridge: Cambridge University Press.
Google Scholar
Salton, G., and C. Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 (5): 513–523.
Article Google Scholar
Singh, J., G. Singh, and R. Singh. 2017. Optimization of sentiment analysis using machine learning classifiers. Human-Centric Computing and Information Sciences 7 (1): 32.
Article Google Scholar
Su, F., and K. Markert. 2008. From words to senses: A case study of subjectivity recognition. In Proceedings of the 22nd international conference on computational linguistics (Coling 2008), 825–832.
Google Scholar
Turney, P.D. 2000. Learning algorithms for keyphrase extraction. Information Retrieval 2 (4): 303–336.
Article Google Scholar
Uzun, Y. 2005. Keyword extraction using Naive Bayes. Bilkent University, Department of Computer Science, Turkey. Available at: www.cs.bilkent.edu.tr/guvenir/courses/CS550/Workshop/YasinUzun.pdf.
Wang, Q., W. Zhang, J. Li, F. Mai, and Z. Ma. 2022. Effect of online review sentiment on product sales: The moderating role of review credibility perception. Computers in Human Behavior 133: 107272. https://doi.org/10.1016/j.chb.2022.107272.
Wartena, C., and R. Brussee. 2008. Topic detection by clustering keywords. In 2008 19th international workshop on database and expert systems applications, 54–58. IEEE.
Google Scholar
Wodak, R. 2011. Critical linguistics and critical discourse analysis. Discursive Pragmatics 8: 50–70.
Article Google Scholar
Zhai, Z., B. Liu, H. Xu, and P. Jia. 2011. Clustering product features for opinion mining. In Proceedings of the fourth ACM international conference on Web search and data mining, 347–354.
Google Scholar
Zhang, C. 2008. Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems 4 (3): 1169–1180.
Google Scholar
Zhu, L., A. Galstyan, J. Cheng, and K. Lerman. 2014. Tripartite graph clustering for dynamic sentiment analysis on social media. In Proceedings of the 2014 ACM SIGMOD international conference on management of data, 1531–1542.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Lincoln, Lincoln, Lincolnshire, UK
Saeid Pourroostaei Ardakani
Qingdao City University, Qingdao, Shandong, China
Ali Cheshmehzangi

Authors

Saeid Pourroostaei Ardakani
View author publications
You can also search for this author in PubMed Google Scholar
Ali Cheshmehzangi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeid Pourroostaei Ardakani .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pourroostaei Ardakani, S., Cheshmehzangi, A. (2023). Optimized Clustering Model for Healthcare Sentiments on Twitter: A Big Data Analysis Approach. In: Big Data Analytics for Smart Transport and Healthcare Systems. Urban Sustainability. Springer, Singapore. https://doi.org/10.1007/978-981-99-6620-2_9

Download citation

DOI: https://doi.org/10.1007/978-981-99-6620-2_9
Published: 04 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6619-6
Online ISBN: 978-981-99-6620-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics