Skip to main content
Log in

Accurate classification of socially generated medical discourse

  • Regular Paper
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

Abstract

The growth of online health communities particularly those involving socially generated content can provide considerable value for society. Participants can gain knowledge of medical information or interact with peers on medical forum platforms. Analysing sentiment expressed by members of a health community in medical forum discourse can be of significant value, such as by identifying a particular aspect of an information space, determining themes that predominate among a large data set, and allowing people to summarize topics within a big data set. In this paper, we identify sentiments expressed in online medical forums that discuss Lyme disease. There are two goals in our research: first, to identify a complete and relevant set of categories that can characterize Lyme disease discourse; and second, to test and investigate strategies, both individually and collectively, for automating the classification of medical forum posts into those categories. We present a feature-based model that consists of three different feature sets: content-free, content-specific and meta-level features. Employing inductive learning algorithms to build a feature-based classification model, we assess the feasibility and accuracy of our automated classification. We further evaluate our model by assessing its ability to adapt to an online medical forum discussing Lupus disease. The experimental results demonstrate the effectiveness of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The terms sentiment and affect have been used interchangeably in the literature, where they refer to extraction of opinions, emotions or views that may be expressed in the text.

  2. https://requester.mturk.com/.

  3. http://help.sentiment140.com/.

  4. www.elderresearch.com/target-shuffling.

References

  1. Petrie, K.J., Weinman, J.: Perceptions of Health and Illness: Current Research and Applications. Taylor & Francis, Boca Raton (1997)

    Google Scholar 

  2. Davison, K.P., Pennebaker, J.W., Dickerson, S.S.: Who talks? The social psychology of illness support groups. Am. Psychol. 55(2), 205 (2000)

    Article  Google Scholar 

  3. Bhatia, S., Mitra, P.: Adopting inference networks for online thread retrieval. In: AAAI, vol. 10, pp. 1300–1305 (2010)

  4. Bobicev, V., Sokolova, M., Oakes, M.: What goes around comes around: learning sentiments in online medical forums. Cognit. Comput. 7(5), 609–621 (2015)

    Article  Google Scholar 

  5. Zhang, T., Cho, J.H., Zhai, C.: Understanding user intents in online health forums. In: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 220–229. ACM (2014)

  6. Fox, S.: The Social Life of Health Information, 2011. Pew Internet & American Life Project, Washington (2011)

    Google Scholar 

  7. Bravo-Marquez, F., Mendoza, M., Poblete, B.: Meta-level sentiment models for big social data analysis. Knowl. Based Syst. 69, 86–99 (2014)

    Article  Google Scholar 

  8. Biyani, P., Bhatia, S., Caragea, C., Mitra, P.: Using non-lexical features for identifying factual and opinionative threads in online forums. Knowl. Based Syst. 69, 170–178 (2014)

    Article  Google Scholar 

  9. Ding, X., Liu, B.: The utility of linguistic rules in opinion mining. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 811–812. ACM (2007)

  10. Poggi, I., D’Errico, F.: Multimodal acid communication of a politician. In: ESSEM@ AI* IA, pp. 59–70 (2013)

  11. Cieliebak, M., Dürr, O., Uzdilli, F.: Potential and limitations of commercial sentiment detection tools. In: ESSEM@ AI* IA, pp. 47–58 (2013)

  12. Khan, F.H., Qamar, U., Bashir, S.: eSAP: a decision support framework for enhanced sentiment analysis and polarity classification. Inf. Sci. 367, 862–873 (2016)

    Article  Google Scholar 

  13. Al-Twairesh, N., Al-Khalifa, H., Al-Salman, A.: Subjectivity and sentiment analysis of arabic: trends and challenges. In: 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), pp. 148–155. IEEE (2014)

  14. Plutchik, R.: The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am. Sci. 89(4), 344–350 (2001)

    Article  Google Scholar 

  15. Staiano, J., Guerini, M.: Depechemood: a lexicon for emotion analysis from crowd-annotated news. arXiv:1405.1605 (2014)

  16. Bravo-Marquez, F., Frank, E., Mohammad, S.M., Pfahringer, B.: Determining word-emotion associations from tweets by multi-label classification. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 536–539. IEEE (2016)

  17. Mohammad, S.M., Turney, P.D.: Emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon. In: Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. Association for Computational Linguistics, pp. 26–34 (2010)

  18. Wang, X., Wei, F., Liu, X., Zhou, M., Zhang, M.: Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1031–1040. ACM (2011)

  19. Abbasi, A., Chen, H.: Applying authorship analysis to extremist-group web forum messages. IEEE Intell. Syst. 20(5), 67–75 (2005)

    Article  Google Scholar 

  20. Dang, Y., Zhang, Y., Chen, H.: A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell. Syst. 25(4), 46–53 (2010)

    Article  Google Scholar 

  21. Alnashwan, R., O’Riordan, A.P., Sorensen, H., Hoare, C.: Improving sentiment analysis through ensemble learning of meta-level features. In: KDWEB 2016: 2nd International Workshop on Knowledge Discovery on the Web. Sun SITE Central Europe (CEUR)/RWTH Aachen University (2016)

  22. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Assoc. Inf. Sci. Technol. 57(3), 378–393 (2006)

    Article  Google Scholar 

  23. Lu, Y.: Automatic topic identification of health-related messages in online health community using text classification. SpringerPlus 2(1), 309 (2013)

    Article  Google Scholar 

  24. Baccianella, S., Esuli, A., Sebastiani, F.: Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: LREC, vol. 10, pp. 2200–2204 (2010)

  25. Esuli, A., Sebastiani, F.: Sentiwordnet: a publicly available lexical resource for opinion mining. In: Proceedings of the 5th Conference on Language Resources and Evaluation (LREC06), pp. 417–422 (2006)

  26. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)

    Article  Google Scholar 

  27. Bradley, M.M., Lang, P.J.: Affective norms for English words (anew): instruction manual and affective ratings, Technical report C-1, the center for research in psychophysiology. University of Florida, Tech. Rep. (1999)

  28. Nielsen, F. Å.: A new anew: evaluation of a word list for sentiment analysis in microblogs. arXiv:1103.2903 (2011)

  29. Mohammad, S.M., Kiritchenko, S., Zhu, X.: NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. arXiv:1308.6242 (2013)

  30. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social web. J. Assoc. Inf. Sci. Technol. 63(1), 163–173 (2012)

    Article  Google Scholar 

  31. Cambria, E., Havasi, C., Hussain, A.: Senticnet 2: a semantic and affective resource for opinion mining and sentiment analysis. In: FLAIRS Conference, pp. 202–207 (2012)

  32. Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. 1(1), 69–90 (1999)

    Article  Google Scholar 

  33. Nichols, T.R., Wisner, P.M., Cripe, G., Gulabchand, L.: Putting the kappa statistic to use. Qual. Assur. J. 13(3–4), 57–61 (2010)

    Article  Google Scholar 

  34. Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 153–158 (1997)

    Article  Google Scholar 

  35. Guo, B., Nixon, M.S.: Gait feature subset selection by mutual information. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 39(1), 36–46 (2009)

    Article  Google Scholar 

  36. Bihis, M., Roychowdhury, S.: A generalized flow for multi-class and binary classification tasks: an azure ml approach. In: 2015 IEEE International Conference on Big Data (Big Data). pp. 1728–1737. IEEE (2015)

  37. Salathe, M., Bengtsson, L., Bodnar, T.J., Brewer, D.D., Brownstein, J.S., Buckee, C., Campbell, E.M., Cattuto, C., Khandelwal, S., Mabry, P.L., et al.: Digital epidemiology. PLoS Comput. Biol. 8(7), e1002616 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rana Alnashwan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alnashwan, R., Sorensen, H., O’Riordan, A. et al. Accurate classification of socially generated medical discourse. Int J Data Sci Anal 8, 353–365 (2019). https://doi.org/10.1007/s41060-018-0128-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0128-8

Keywords

Navigation