Abstract
Social media, such as Twitter, typically stores a large amount of user-generated content regarding different aspects of society. These contents include social events, e-commerce products, healthcare, etc. This chapter proposes a best-fitted clustering method to classify sentiment samples related to healthcare topics. Thus, we examine other clustering models with keyword extraction methods on the real healthcare datasets collected from Twitter. The experiment results indicate that self-organized map model with the TF-IDF extraction method can achieve the best clustering accuracy. Moreover, the optimized model can have great potential to handle large-scale data in real practice.
The research work in this chapter was supported by our research team members: Chenhao Wang, Erfan Wang, Zhe Hua, and Zhengrui Xue.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abualigah, L., H.E. Alfar, M. Shehab, and A.M.A. Hussein. 2020. Sentiment analysis in healthcare: A brief review. In Recent advances in NLP: The case of Arabic language. Studies in computational intelligence, vol. 874, ed. M. Abd Elaziz, M. Al-qaness, A. Ewees and A. Dahou. Cham: Springer. https://doi.org/10.1007/978-3-030-34614-0_7.
Alguliev, R.M., and R.M. Aliguliyev. 2005. Effective summarization method of text documents. In The 2005 IEEE/WIC/ACM international conference on web intelligence (WI’05), 264–271. IEEE.
Andrzejewski, D., and X. Zhu. 2009. Latent Dirichlet allocation with topic-in-set knowledge. In Proceedings of the NAACL HLT 2009 workshop on semi-supervised learning for natural language processing, 43–48.
Asan, U., and S. Ercan. 2012. An introduction to self-organizing maps. In Computational intelligence systems in industrial engineering, vol. 6, ed. C. Kahraman, 295–315. Paris: Atlantis Press.
Bahrainian, S.A., and A. Dengel. 2013. Sentiment analysis and summarization of twitter data. In 2013 IEEE 16th international conference on computational science and engineering, 227–234. IEEE.
Belle, A., R. Thiagarajan, S. Soroushmehr, F. Navidi, D.A. Beard, and K. Najarian. 2015. Big data analytics in healthcare, BioMed Research International 2015.
Bracewell, D.B., F. Ren, and S. Kuriowa. 2005. Multilingual single document keyword extraction for information retrieval. In 2005 international conference on natural language processing and knowledge engineering, 517–522. IEEE.
Brin, S., and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine, computer networks and ISDN systems. In Proceedings of the seventh international world wide web conference, vol. 30, no. 1, 107–117 [Online]. Available at: https://www.sciencedirect.com/science/article/pii/S016975529800110X.
Chluski, A., and L. Ziora. 2015. The application of big data in the management of healthcare organizations: A review of selected practical solutions. Informatyka Ekonomicz (01).
Cohen, P.R., and C.A. Sutton. 2003. Very predictive ngrams for space-limited probabilistic models. In International symposium on intelligent data analysis, 134–142. Berlin: Springer.
Guha, S., A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. 2003. Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15 (3): 515–528.
Guo, A., and T. Yang. 2016. Research and improvement of feature words weight based on tfidf algorithm. In 2016 IEEE information technology, networking, electronic and automation control conference, 415–419.
Gupta, V.S., and S. Kohli. 2016. Twitter sentiment analysis in healthcare using hadoop and r. In 2016 3rd international conference on computing for sustainable global development (INDIACom), 3766–3772. IEEE.
Hanks, P. 2013. Lexical analysis: Norms and exploitations. Cambridge: MIT Press.
Hartigan, J.A. 1975. Clustering algorithms. New York: Wiley.
Hatzivassiloglou, V., and K. McKeown. 1997. Predicting the semantic orientation of adjectives. In 35th annual meeting of the association for computational linguistics and 8th conference of the European chapter of the Association for Computational Linguistics, 174–181.
Hu, X., and B. Wu. 2006. Automatic keyword extraction using linguistic features. In Sixth IEEE international conference on data mining-workshops (ICDMW’06), 19–23. IEEE.
Hu, J., S. Li, Y. Yao, L. Yu, G. Yang, and J. Hu. 2018. Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy 20 (2): 104.
Hulth, A. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on empirical methods in natural language processing, 216–223.
Kaur, J., and V. Gupta. 2010. Effective approaches for extraction of keywords. International Journal of Computer Science Issues (IJCSI) 7 (6): 144.
Kohler, R. 2012. Quantitative syntax analysis, vol. 65. Walter de Gruyter.
Kohonen, T. 1982. Self-organized formation of topologically correct feature maps. Biological Cybernetics 43 (1): 59–69. Available at: https://doi.org/10.1007/BF00337288.
Kohonen, T. 2001. Self-organizing maps, vol. 30. The information sciences book series. Berlin and Heidelberg: Springer [Online]. Available at: https://doi.org/10.1007/978-3-642-56927-2.
MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14, Oakland, CA, USA, 281–297.
Matsuo, Y., and M. Ishizuka. 2004. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools 13 (01): 157–169.
Mihalcea, R., and P. Tarau. 2004. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing, 404–411.
Na, S., L. Xumin, and G. Yong. 2010. Research on k-means clustering algorithm: An improved k-means clustering algorithm. In 2010 third international symposium on intelligent information technology and security informatics, 63–67. IEEE.
Navada, A., A.N. Ansari, S. Patil, and B.A. Sonkamble. 2011. Overview of use of decision tree algorithms in machine learning. In 2011 IEEE control and system graduate research colloquium, 37–42. IEEE.
Ohsawa, Y., N.E. Benson, and M. Yachida. 1998. Keygraph: Automatic indexing by co-occurrence graph based on building construction metaphor. In Proceedings IEEE international forum on research and technology advances in digital libraries-ADL’98, 12–18. IEEE.
Onoda, T., T. Yumoto, and K. Sumiya. 2008. Extracting and clustering related keywords based on history of query frequency. In 2008 second international symposium on universal communication, 162–166. IEEE.
Ouyang, Y., W. Li, S. Li, and Q. Lu. 2011. Applying regression models to query-focused multi-document summarization. Information Processing & Management 47 (2): 227–237.
Pat Research. 2020. Big data analytics and predictive analytics in 2021—Reviews, features, pricing, comparison [Online]. Available at: https://www.predictiveanalyticstoday.com/big-data-analytics-and-predictive-analytics/.
Raghupathi, W., and V. Raghupathi. 2014. Big data analytics in healthcare: Promise and potential. Health Information Science and Systems 2 (1): 1–10.
Rajaraman, A., and J.D. Ullman. 2011. Data mining, 1–17. Cambridge: Cambridge University Press.
Salton, G., and C. Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 (5): 513–523.
Singh, J., G. Singh, and R. Singh. 2017. Optimization of sentiment analysis using machine learning classifiers. Human-Centric Computing and Information Sciences 7 (1): 32.
Su, F., and K. Markert. 2008. From words to senses: A case study of subjectivity recognition. In Proceedings of the 22nd international conference on computational linguistics (Coling 2008), 825–832.
Turney, P.D. 2000. Learning algorithms for keyphrase extraction. Information Retrieval 2 (4): 303–336.
Uzun, Y. 2005. Keyword extraction using Naive Bayes. Bilkent University, Department of Computer Science, Turkey. Available at: www.cs.bilkent.edu.tr/guvenir/courses/CS550/Workshop/YasinUzun.pdf.
Wang, Q., W. Zhang, J. Li, F. Mai, and Z. Ma. 2022. Effect of online review sentiment on product sales: The moderating role of review credibility perception. Computers in Human Behavior 133: 107272. https://doi.org/10.1016/j.chb.2022.107272.
Wartena, C., and R. Brussee. 2008. Topic detection by clustering keywords. In 2008 19th international workshop on database and expert systems applications, 54–58. IEEE.
Wodak, R. 2011. Critical linguistics and critical discourse analysis. Discursive Pragmatics 8: 50–70.
Zhai, Z., B. Liu, H. Xu, and P. Jia. 2011. Clustering product features for opinion mining. In Proceedings of the fourth ACM international conference on Web search and data mining, 347–354.
Zhang, C. 2008. Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems 4 (3): 1169–1180.
Zhu, L., A. Galstyan, J. Cheng, and K. Lerman. 2014. Tripartite graph clustering for dynamic sentiment analysis on social media. In Proceedings of the 2014 ACM SIGMOD international conference on management of data, 1531–1542.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Pourroostaei Ardakani, S., Cheshmehzangi, A. (2023). Optimized Clustering Model for Healthcare Sentiments on Twitter: A Big Data Analysis Approach. In: Big Data Analytics for Smart Transport and Healthcare Systems. Urban Sustainability. Springer, Singapore. https://doi.org/10.1007/978-981-99-6620-2_9
Download citation
DOI: https://doi.org/10.1007/978-981-99-6620-2_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6619-6
Online ISBN: 978-981-99-6620-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)