Skip to main content

Semi-Automatic Training Set Construction for Supervised Sentiment Analysis in Polarized Contexts

  • Chapter
  • First Online:
Putting Social Media and Networking Data in Practice for Education, Planning, Prediction and Recommendation

Abstract

Standard sentiment analysis techniques rely either on sets of rules based on semantic and affective information or in supervised machine learning approaches whose quality heavily depends on the size and significance of a training set of pre-labeled text samples. In many situations, this labeling needs to be performed by hand, potentially limiting the size of the training set. In order to address this issue, in this work we propose a methodology to retrieve text samples from Twitter and automatically label them. We then apply this methodology to several Twitter conversations and assess the quality of the produced training sets. Additionally, we also tackle the situation in which the base rates of positive and negative sentiment samples in the training and test sets are biased with respect to the system in which the classifier is intended to be applied. The results presented in this respect hold relevance beyond this particular application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. W. Medhat, A. Hassan, H. Korashy, Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014) [Online]. http://www.sciencedirect.com/science/article/pii/S2090447914000550

    Article  Google Scholar 

  2. P. Yang, Y. Chen, A survey on sentiment analysis by using machine learning methods, in 2017 IEEE 2nd Information, Technology, Networking, Electronic and Automation Control Conference (ITNEC) (IEEE, Piscataway, 2017), pp. 117–121

    Google Scholar 

  3. S. Wang, C.D. Manning, Baselines and bigrams: simple, good sentiment and topic classification, in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers – Volume 2, ser. ACL ’12 (Association for Computational Linguistics, Stroudsburg, 2012), pp. 90–94 [Online]. http://dl.acm.org/citation.cfm?id=2390665.2390688

  4. V. Bobicev, M. Sokolova, Inter-annotator agreement in sentiment analysis: machine learning perspective, in Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 (2017), pp. 97–102

    Google Scholar 

  5. A. Pak, P. Paroubek, Twitter as a corpus for sentiment analysis and opinion mining, in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), vol. 10 (2010)

    Google Scholar 

  6. C. Vania, M. Ibrahim, M. Adriani, Sentiment lexicon generation for an under-resourced language. Int. J. Comput. Linguistics Appl. 5(1), 59–72 (2014)

    Google Scholar 

  7. A.F. Wicaksono, C. Vania, B. Distiawan, M. Adriani, Automatically building a corpus for sentiment analysis on indonesian tweets, in Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing (2014), pp. 185–194

    Google Scholar 

  8. S. Martin-Gutierrez, J.C. Losada, R.M. Benito, Semi-automatic training set construction for supervised sentiment analysis in political contexts, in 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (IEEE, Piscataway, 2018), pp. 715–720

    Google Scholar 

  9. G. Olivares, J.P. Cárdenas, J.C. Losada, J. Borondo, Opinion polarization during a dichotomous electoral process. Complexity 2019, 9 (2019)

    Article  Google Scholar 

  10. M. Hürlimann, B. Davis, K. Cortis, A. Freitas, S. Handschuh, S. Fernández, A Twitter sentiment gold standard for the brexit referendum, in SEMANTiCS 2016 Proceedings of the 12th International Conference on Semantic Systems (2016), pp. 193–196

    Google Scholar 

  11. M.T. Bastos, D. Mercea, The brexit botnet and user-generated hyperpartisan news. Soc. Sci. Comput. Rev. 37(1), 38–54 (2019)

    Article  Google Scholar 

  12. M.E. Del Valle, R.B. Bravo, Echo chambers in parliamentary Twitter networks: the Catalan case. Int. J. Commun. 12, 21 (2018)

    Article  Google Scholar 

  13. F. Guerrero-Solé, Community detection in political discussions on Twitter: an application of the retweet overlap network method to the Catalan process toward independence. Soc. Sci. Comput. Rev. 35(2), 244–261 (2017)

    Article  Google Scholar 

  14. U. Yaqub, S.A. Chun, V. Atluri, J. Vaidya, Analysis of political discourse on Twitter in the context of the 2016 US presidential elections. Gov. Inf. Q. 34(4), 613–626 (2017)

    Article  Google Scholar 

  15. S.B. Hobolt, T. Leeper, J. Tilley, Divided by the Vote: Affective Polarization in the Wake of Brexit (American Political Science Association, Boston, 2018)

    Google Scholar 

  16. M. Del Vicario, F. Zollo, G. Caldarelli, A. Scala, W. Quattrociocchi, Mapping social dynamics on facebook: the brexit debate. Soc. Net. 50, 6–16 (2017)

    Article  Google Scholar 

  17. D. Martí, D. Cetrà, The 2015 Catalan election: a de facto referendum on independence? Reg. Fed. Stud. 26(1), 107–119 (2016)

    Article  Google Scholar 

  18. A. Barrio, J. Rodríguez-Teruel, Reducing the gap between leaders and voters? elite polarization, outbidding competition, and the rise of secessionism in catalonia. Ethn. Racial Stud. 40(10), 1776–1794 (2017)

    Article  Google Scholar 

  19. I. Serrano, Just a matter of identity? Support for independence in Catalonia. Reg. Fed. Stud. 23(5), 523–545 (2013)

    Article  Google Scholar 

  20. P. Grover, A.K. Kar, Y.K. Dwivedi, M. Janssen, Polarization and acculturation in US election 2016 outcomes–can Twitter analytics predict changes in voting preferences. Technol. Forecast. Soc. Chang. 145(C), pp. 438–460 (2018)

    Google Scholar 

  21. D.F. Pacheco, F. Lima-neto, L.G. Moyano, R. Menezes, Football conversations: what Twitter reveals about the 2014 world cup, in Brazilian Workshop on Social Network Analysis and Mining (CSBC 2015-BraSNAM), Recife (2015)

    Google Scholar 

  22. Z. Liu, I. Weber, Predicting ideological friends and foes in Twitter conflicts, in Proceedings of the 23rd International Conference on World Wide Web (ACM, New York, 2014), pp. 575–576

    Google Scholar 

  23. “Partido real madrid - fc barcelona en directo,” online, accessed 10-December-2018 [Online]. https://www.laliga.es/directo/temporada-2017-2018/laliga-santander/17/real-madrid_barcelona

  24. Wikipedia, “Anexo: Clubes españoles de fútbol ganadores de competiciones nacionales e internacionales — Wikipedia, the free encyclopedia,” (2018), online, accessed 10-December-2018. [Online]. https://es.wikipedia.org/wiki/Anexo:Clubes_espa%C3%B1oles_de_f%C3%BAtbol_ganadores_de_competiciones_nacionales_e_internacionales

  25. M. Conover, J. Ratkiewicz, M.R. Francisco, B. Gonçalves, F. Menczer, A. Flammini, Political polarization on Twitter. Fifth International AAAI Conference on Weblogs and Social Media, vol. 133, pp. 89–96 (2011)

    Google Scholar 

  26. L.A. Adamic, N. Glance, The political blogosphere and the 2004 US election: divided they blog, in Proceedings of the 3rd International Workshop on Link Discovery (ACM, New York, 2005), pp. 36–43

    Google Scholar 

  27. E. Hargittai, J. Gallo, M. Kane, Cross-ideological discussions among conservative and liberal bloggers. Public Choice 134(1–2), 67–86 (2008)

    Google Scholar 

  28. M.D. Conover, B. Gonçalves, A. Flammini, F. Menczer, Partisan asymmetries in online political activity. EPJ Data Sci. 1(1), 6 (2012)

    Google Scholar 

  29. A. Morales, J. Borondo, J.C. Losada, R.M. Benito, Measuring political polarization: Twitter shows the two sides of venezuela. Chaos Interdisciplinary J. Nonlinear Sci. 25(3), 033114 (2015)

    Article  Google Scholar 

  30. E.M. Cámara, M.A.G. Cumbreras, J.V. Román, J.G. Morera, Tass 2015 – the evolution of the spanish opinion mining systems. Procesamiento del Lenguaje Natural 56, 33–40 (2016) [Online]. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5284

  31. S. Martin-Gutierrez, J.C. Losada, R.M. Benito, Recurrent patterns of user behavior in different electoral campaigns: a Twitter analysis of the Spanish general elections of 2015 and 2016. Complexity 2018, 2413481 (2018) [Online]. https://doi.org/10.1155/2018/2413481

    Article  Google Scholar 

  32. J. Borondo, A.J. Morales, J.C. Losada, R.M. Benito, Characterizing and modeling an electoral campaign in the context of Twitter: 2011 Spanish presidential election as a case study. Chaos Interdisciplinary J. Nonlinear Sci. 22(2), 023138 (2012) [Online]. http://aip.scitation.org/doi/abs/10.1063/1.4729139

    Article  Google Scholar 

  33. J. Borondo, A. Morales, R. Benito, J. Losada, Multiple leaders on a multilayer social media. Chaos, Solitons Fractals 72, 90–98 (2015)

    Article  ADS  MathSciNet  Google Scholar 

  34. J. Atserias, B. Casas, E. Comelles, M. González, L. Padró, and M. Padró, Freeling 1.3: syntactic and semantic services in an open-source NLP library, in Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006) (ELRA, Genoa, 2006)

    Google Scholar 

  35. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  36. A. Niculescu-Mizil, R. Caruana, Predicting good probabilities with supervised learning, in Proceedings of the 22nd International Conference on Machine Learning, ser. ICML ’05 (ACM, New York, 2005), pp. 625–632. [Online]. http://doi.acm.org/10.1145/1102351.1102430

  37. Walber, “File:precisionrecall.svg,” Last accessed 11-July-2019. [Online]. https://en.wikipedia.org/wiki/File:Precisionrecall.svg

  38. S. Wallis, Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods. J. Quan. Linguist. 20(3), 178–208 (2013)

    Article  Google Scholar 

  39. K.A. Spackman, Signal detection theory: valuable tools for evaluating inductive learning, in Proceedings of the Sixth International Workshop on Machine Learning (Elsevier, Amsterdam, 1989), pp. 160–163

    Google Scholar 

Download references

Acknowledgements

We would like to thank the TASS organization for allowing us to use their sentiment analysis corpus. This work has been supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under Contract No. MTM2015-63914-P and by the Spanish Ministry of Science, Innovation and Universities (MICIU) Contract No. PGC2018-093854-B-100.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Martin-Gutierrez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Martin-Gutierrez, S., Losada, J.C., Benito, R.M. (2020). Semi-Automatic Training Set Construction for Supervised Sentiment Analysis in Polarized Contexts. In: Kaya, M., Birinci, Ş., Kawash, J., Alhajj, R. (eds) Putting Social Media and Networking Data in Practice for Education, Planning, Prediction and Recommendation. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-33698-1_10

Download citation

Publish with us

Policies and ethics