Skip to main content

Optimized Clustering Model for Healthcare Sentiments on Twitter: A Big Data Analysis Approach

  • Chapter
  • First Online:
Big Data Analytics for Smart Transport and Healthcare Systems

Abstract

Social media, such as Twitter, typically stores a large amount of user-generated content regarding different aspects of society. These contents include social events, e-commerce products, healthcare, etc. This chapter proposes a best-fitted clustering method to classify sentiment samples related to healthcare topics. Thus, we examine other clustering models with keyword extraction methods on the real healthcare datasets collected from Twitter. The experiment results indicate that self-organized map model with the TF-IDF extraction method can achieve the best clustering accuracy. Moreover, the optimized model can have great potential to handle large-scale data in real practice.

The research work in this chapter was supported by our research team members: Chenhao Wang, Erfan Wang, Zhe Hua, and Zhengrui Xue.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abualigah, L., H.E. Alfar, M. Shehab, and A.M.A. Hussein. 2020. Sentiment analysis in healthcare: A brief review. In Recent advances in NLP: The case of Arabic language. Studies in computational intelligence, vol. 874, ed. M. Abd Elaziz, M. Al-qaness, A. Ewees and A. Dahou. Cham: Springer. https://doi.org/10.1007/978-3-030-34614-0_7.

  • Alguliev, R.M., and R.M. Aliguliyev. 2005. Effective summarization method of text documents. In The 2005 IEEE/WIC/ACM international conference on web intelligence (WI’05), 264–271. IEEE.

    Google Scholar 

  • Andrzejewski, D., and X. Zhu. 2009. Latent Dirichlet allocation with topic-in-set knowledge. In Proceedings of the NAACL HLT 2009 workshop on semi-supervised learning for natural language processing, 43–48.

    Google Scholar 

  • Asan, U., and S. Ercan. 2012. An introduction to self-organizing maps. In Computational intelligence systems in industrial engineering, vol. 6, ed. C. Kahraman, 295–315. Paris: Atlantis Press.

    Google Scholar 

  • Bahrainian, S.A., and A. Dengel. 2013. Sentiment analysis and summarization of twitter data. In 2013 IEEE 16th international conference on computational science and engineering, 227–234. IEEE.

    Google Scholar 

  • Belle, A., R. Thiagarajan, S. Soroushmehr, F. Navidi, D.A. Beard, and K. Najarian. 2015. Big data analytics in healthcare, BioMed Research International 2015.

    Google Scholar 

  • Bracewell, D.B., F. Ren, and S. Kuriowa. 2005. Multilingual single document keyword extraction for information retrieval. In 2005 international conference on natural language processing and knowledge engineering, 517–522. IEEE.

    Google Scholar 

  • Brin, S., and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine, computer networks and ISDN systems. In Proceedings of the seventh international world wide web conference, vol. 30, no. 1, 107–117 [Online]. Available at: https://www.sciencedirect.com/science/article/pii/S016975529800110X.

  • Chluski, A., and L. Ziora. 2015. The application of big data in the management of healthcare organizations: A review of selected practical solutions. Informatyka Ekonomicz (01).

    Google Scholar 

  • Cohen, P.R., and C.A. Sutton. 2003. Very predictive ngrams for space-limited probabilistic models. In International symposium on intelligent data analysis, 134–142. Berlin: Springer.

    Google Scholar 

  • Guha, S., A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. 2003. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15 (3): 515–528.

    Article  Google Scholar 

  • Guo, A., and T. Yang. 2016. Research and improvement of feature words weight based on tfidf algorithm. In 2016 IEEE information technology, networking, electronic and automation control conference, 415–419.

    Google Scholar 

  • Gupta, V.S., and S. Kohli. 2016. Twitter sentiment analysis in healthcare using hadoop and r. In 2016 3rd international conference on computing for sustainable global development (INDIACom), 3766–3772. IEEE.

    Google Scholar 

  • Hanks, P. 2013. Lexical analysis: Norms and exploitations. Cambridge: MIT Press.

    Book  Google Scholar 

  • Hartigan, J.A. 1975. Clustering algorithms. New York: Wiley.

    MATH  Google Scholar 

  • Hatzivassiloglou, V., and K. McKeown. 1997. Predicting the semantic orientation of adjectives. In 35th annual meeting of the association for computational linguistics and 8th conference of the European chapter of the Association for Computational Linguistics, 174–181.

    Google Scholar 

  • Hu, X., and B. Wu. 2006. Automatic keyword extraction using linguistic features. In Sixth IEEE international conference on data mining-workshops (ICDMW’06), 19–23. IEEE.

    Google Scholar 

  • Hu, J., S. Li, Y. Yao, L. Yu, G. Yang, and J. Hu. 2018. Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy 20 (2): 104.

    Article  Google Scholar 

  • Hulth, A. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on empirical methods in natural language processing, 216–223.

    Google Scholar 

  • Kaur, J., and V. Gupta. 2010. Effective approaches for extraction of keywords. International Journal of Computer Science Issues (IJCSI) 7 (6): 144.

    Google Scholar 

  • Kohler, R. 2012. Quantitative syntax analysis, vol. 65. Walter de Gruyter.

    Google Scholar 

  • Kohonen, T. 1982. Self-organized formation of topologically correct feature maps. Biological Cybernetics 43 (1): 59–69. Available at: https://doi.org/10.1007/BF00337288.

  • Kohonen, T. 2001. Self-organizing maps, vol. 30. The information sciences book series. Berlin and Heidelberg: Springer [Online]. Available at: https://doi.org/10.1007/978-3-642-56927-2.

  • MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14, Oakland, CA, USA, 281–297.

    Google Scholar 

  • Matsuo, Y., and M. Ishizuka. 2004. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools 13 (01): 157–169.

    Article  Google Scholar 

  • Mihalcea, R., and P. Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, 404–411.

    Google Scholar 

  • Na, S., L. Xumin, and G. Yong. 2010. Research on k-means clustering algorithm: An improved k-means clustering algorithm. In 2010 third international symposium on intelligent information technology and security informatics, 63–67. IEEE.

    Google Scholar 

  • Navada, A., A.N. Ansari, S. Patil, and B.A. Sonkamble. 2011. Overview of use of decision tree algorithms in machine learning. In 2011 IEEE control and system graduate research colloquium, 37–42. IEEE.

    Google Scholar 

  • Ohsawa, Y., N.E. Benson, and M. Yachida. 1998. Keygraph: Automatic indexing by co-occurrence graph based on building construction metaphor. In Proceedings IEEE international forum on research and technology advances in digital libraries-ADL’98, 12–18. IEEE.

    Google Scholar 

  • Onoda, T., T. Yumoto, and K. Sumiya. 2008. Extracting and clustering related keywords based on history of query frequency. In 2008 second international symposium on universal communication, 162–166. IEEE.

    Google Scholar 

  • Ouyang, Y., W. Li, S. Li, and Q. Lu. 2011. Applying regression models to query-focused multi-document summarization. Information Processing & Management 47 (2): 227–237.

    Article  Google Scholar 

  • Pat Research. 2020. Big data analytics and predictive analytics in 2021—Reviews, features, pricing, comparison [Online]. Available at: https://www.predictiveanalyticstoday.com/big-data-analytics-and-predictive-analytics/.

  • Raghupathi, W., and V. Raghupathi. 2014. Big data analytics in healthcare: Promise and potential. Health Information Science and Systems 2 (1): 1–10.

    Article  Google Scholar 

  • Rajaraman, A., and J.D. Ullman. 2011. Data mining, 1–17. Cambridge: Cambridge University Press.

    Google Scholar 

  • Salton, G., and C. Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 (5): 513–523.

    Article  Google Scholar 

  • Singh, J., G. Singh, and R. Singh. 2017. Optimization of sentiment analysis using machine learning classifiers. Human-Centric Computing and Information Sciences 7 (1): 32.

    Article  Google Scholar 

  • Su, F., and K. Markert. 2008. From words to senses: A case study of subjectivity recognition. In Proceedings of the 22nd international conference on computational linguistics (Coling 2008), 825–832.

    Google Scholar 

  • Turney, P.D. 2000. Learning algorithms for keyphrase extraction. Information Retrieval 2 (4): 303–336.

    Article  Google Scholar 

  • Uzun, Y. 2005. Keyword extraction using Naive Bayes. Bilkent University, Department of Computer Science, Turkey. Available at: www.cs.bilkent.edu.tr/guvenir/courses/CS550/Workshop/YasinUzun.pdf.

  • Wang, Q., W. Zhang, J. Li, F. Mai, and Z. Ma. 2022. Effect of online review sentiment on product sales: The moderating role of review credibility perception. Computers in Human Behavior 133: 107272. https://doi.org/10.1016/j.chb.2022.107272.

  • Wartena, C., and R. Brussee. 2008. Topic detection by clustering keywords. In 2008 19th international workshop on database and expert systems applications, 54–58. IEEE.

    Google Scholar 

  • Wodak, R. 2011. Critical linguistics and critical discourse analysis. Discursive Pragmatics 8: 50–70.

    Article  Google Scholar 

  • Zhai, Z., B. Liu, H. Xu, and P. Jia. 2011. Clustering product features for opinion mining. In Proceedings of the fourth ACM international conference on Web search and data mining, 347–354.

    Google Scholar 

  • Zhang, C. 2008. Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems 4 (3): 1169–1180.

    Google Scholar 

  • Zhu, L., A. Galstyan, J. Cheng, and K. Lerman. 2014. Tripartite graph clustering for dynamic sentiment analysis on social media. In Proceedings of the 2014 ACM SIGMOD international conference on management of data, 1531–1542.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeid Pourroostaei Ardakani .

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Pourroostaei Ardakani, S., Cheshmehzangi, A. (2023). Optimized Clustering Model for Healthcare Sentiments on Twitter: A Big Data Analysis Approach. In: Big Data Analytics for Smart Transport and Healthcare Systems. Urban Sustainability. Springer, Singapore. https://doi.org/10.1007/978-981-99-6620-2_9

Download citation

Publish with us

Policies and ethics