Abstract
Standard sentiment analysis techniques rely either on sets of rules based on semantic and affective information or in supervised machine learning approaches whose quality heavily depends on the size and significance of a training set of pre-labeled text samples. In many situations, this labeling needs to be performed by hand, potentially limiting the size of the training set. In order to address this issue, in this work we propose a methodology to retrieve text samples from Twitter and automatically label them. We then apply this methodology to several Twitter conversations and assess the quality of the produced training sets. Additionally, we also tackle the situation in which the base rates of positive and negative sentiment samples in the training and test sets are biased with respect to the system in which the classifier is intended to be applied. The results presented in this respect hold relevance beyond this particular application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
W. Medhat, A. Hassan, H. Korashy, Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014) [Online]. http://www.sciencedirect.com/science/article/pii/S2090447914000550
P. Yang, Y. Chen, A survey on sentiment analysis by using machine learning methods, in 2017 IEEE 2nd Information, Technology, Networking, Electronic and Automation Control Conference (ITNEC) (IEEE, Piscataway, 2017), pp. 117–121
S. Wang, C.D. Manning, Baselines and bigrams: simple, good sentiment and topic classification, in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers – Volume 2, ser. ACL ’12 (Association for Computational Linguistics, Stroudsburg, 2012), pp. 90–94 [Online]. http://dl.acm.org/citation.cfm?id=2390665.2390688
V. Bobicev, M. Sokolova, Inter-annotator agreement in sentiment analysis: machine learning perspective, in Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 (2017), pp. 97–102
A. Pak, P. Paroubek, Twitter as a corpus for sentiment analysis and opinion mining, in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), vol. 10 (2010)
C. Vania, M. Ibrahim, M. Adriani, Sentiment lexicon generation for an under-resourced language. Int. J. Comput. Linguistics Appl. 5(1), 59–72 (2014)
A.F. Wicaksono, C. Vania, B. Distiawan, M. Adriani, Automatically building a corpus for sentiment analysis on indonesian tweets, in Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing (2014), pp. 185–194
S. Martin-Gutierrez, J.C. Losada, R.M. Benito, Semi-automatic training set construction for supervised sentiment analysis in political contexts, in 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (IEEE, Piscataway, 2018), pp. 715–720
G. Olivares, J.P. Cárdenas, J.C. Losada, J. Borondo, Opinion polarization during a dichotomous electoral process. Complexity 2019, 9 (2019)
M. Hürlimann, B. Davis, K. Cortis, A. Freitas, S. Handschuh, S. Fernández, A Twitter sentiment gold standard for the brexit referendum, in SEMANTiCS 2016 Proceedings of the 12th International Conference on Semantic Systems (2016), pp. 193–196
M.T. Bastos, D. Mercea, The brexit botnet and user-generated hyperpartisan news. Soc. Sci. Comput. Rev. 37(1), 38–54 (2019)
M.E. Del Valle, R.B. Bravo, Echo chambers in parliamentary Twitter networks: the Catalan case. Int. J. Commun. 12, 21 (2018)
F. Guerrero-Solé, Community detection in political discussions on Twitter: an application of the retweet overlap network method to the Catalan process toward independence. Soc. Sci. Comput. Rev. 35(2), 244–261 (2017)
U. Yaqub, S.A. Chun, V. Atluri, J. Vaidya, Analysis of political discourse on Twitter in the context of the 2016 US presidential elections. Gov. Inf. Q. 34(4), 613–626 (2017)
S.B. Hobolt, T. Leeper, J. Tilley, Divided by the Vote: Affective Polarization in the Wake of Brexit (American Political Science Association, Boston, 2018)
M. Del Vicario, F. Zollo, G. Caldarelli, A. Scala, W. Quattrociocchi, Mapping social dynamics on facebook: the brexit debate. Soc. Net. 50, 6–16 (2017)
D. Martí, D. Cetrà, The 2015 Catalan election: a de facto referendum on independence? Reg. Fed. Stud. 26(1), 107–119 (2016)
A. Barrio, J. Rodríguez-Teruel, Reducing the gap between leaders and voters? elite polarization, outbidding competition, and the rise of secessionism in catalonia. Ethn. Racial Stud. 40(10), 1776–1794 (2017)
I. Serrano, Just a matter of identity? Support for independence in Catalonia. Reg. Fed. Stud. 23(5), 523–545 (2013)
P. Grover, A.K. Kar, Y.K. Dwivedi, M. Janssen, Polarization and acculturation in US election 2016 outcomes–can Twitter analytics predict changes in voting preferences. Technol. Forecast. Soc. Chang. 145(C), pp. 438–460 (2018)
D.F. Pacheco, F. Lima-neto, L.G. Moyano, R. Menezes, Football conversations: what Twitter reveals about the 2014 world cup, in Brazilian Workshop on Social Network Analysis and Mining (CSBC 2015-BraSNAM), Recife (2015)
Z. Liu, I. Weber, Predicting ideological friends and foes in Twitter conflicts, in Proceedings of the 23rd International Conference on World Wide Web (ACM, New York, 2014), pp. 575–576
“Partido real madrid - fc barcelona en directo,” online, accessed 10-December-2018 [Online]. https://www.laliga.es/directo/temporada-2017-2018/laliga-santander/17/real-madrid_barcelona
Wikipedia, “Anexo: Clubes españoles de fútbol ganadores de competiciones nacionales e internacionales — Wikipedia, the free encyclopedia,” (2018), online, accessed 10-December-2018. [Online]. https://es.wikipedia.org/wiki/Anexo:Clubes_espa%C3%B1oles_de_f%C3%BAtbol_ganadores_de_competiciones_nacionales_e_internacionales
M. Conover, J. Ratkiewicz, M.R. Francisco, B. Gonçalves, F. Menczer, A. Flammini, Political polarization on Twitter. Fifth International AAAI Conference on Weblogs and Social Media, vol. 133, pp. 89–96 (2011)
L.A. Adamic, N. Glance, The political blogosphere and the 2004 US election: divided they blog, in Proceedings of the 3rd International Workshop on Link Discovery (ACM, New York, 2005), pp. 36–43
E. Hargittai, J. Gallo, M. Kane, Cross-ideological discussions among conservative and liberal bloggers. Public Choice 134(1–2), 67–86 (2008)
M.D. Conover, B. Gonçalves, A. Flammini, F. Menczer, Partisan asymmetries in online political activity. EPJ Data Sci. 1(1), 6 (2012)
A. Morales, J. Borondo, J.C. Losada, R.M. Benito, Measuring political polarization: Twitter shows the two sides of venezuela. Chaos Interdisciplinary J. Nonlinear Sci. 25(3), 033114 (2015)
E.M. Cámara, M.A.G. Cumbreras, J.V. Román, J.G. Morera, Tass 2015 – the evolution of the spanish opinion mining systems. Procesamiento del Lenguaje Natural 56, 33–40 (2016) [Online]. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5284
S. Martin-Gutierrez, J.C. Losada, R.M. Benito, Recurrent patterns of user behavior in different electoral campaigns: a Twitter analysis of the Spanish general elections of 2015 and 2016. Complexity 2018, 2413481 (2018) [Online]. https://doi.org/10.1155/2018/2413481
J. Borondo, A.J. Morales, J.C. Losada, R.M. Benito, Characterizing and modeling an electoral campaign in the context of Twitter: 2011 Spanish presidential election as a case study. Chaos Interdisciplinary J. Nonlinear Sci. 22(2), 023138 (2012) [Online]. http://aip.scitation.org/doi/abs/10.1063/1.4729139
J. Borondo, A. Morales, R. Benito, J. Losada, Multiple leaders on a multilayer social media. Chaos, Solitons Fractals 72, 90–98 (2015)
J. Atserias, B. Casas, E. Comelles, M. González, L. Padró, and M. Padró, Freeling 1.3: syntactic and semantic services in an open-source NLP library, in Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006) (ELRA, Genoa, 2006)
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
A. Niculescu-Mizil, R. Caruana, Predicting good probabilities with supervised learning, in Proceedings of the 22nd International Conference on Machine Learning, ser. ICML ’05 (ACM, New York, 2005), pp. 625–632. [Online]. http://doi.acm.org/10.1145/1102351.1102430
Walber, “File:precisionrecall.svg,” Last accessed 11-July-2019. [Online]. https://en.wikipedia.org/wiki/File:Precisionrecall.svg
S. Wallis, Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods. J. Quan. Linguist. 20(3), 178–208 (2013)
K.A. Spackman, Signal detection theory: valuable tools for evaluating inductive learning, in Proceedings of the Sixth International Workshop on Machine Learning (Elsevier, Amsterdam, 1989), pp. 160–163
Acknowledgements
We would like to thank the TASS organization for allowing us to use their sentiment analysis corpus. This work has been supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under Contract No. MTM2015-63914-P and by the Spanish Ministry of Science, Innovation and Universities (MICIU) Contract No. PGC2018-093854-B-100.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Martin-Gutierrez, S., Losada, J.C., Benito, R.M. (2020). Semi-Automatic Training Set Construction for Supervised Sentiment Analysis in Polarized Contexts. In: Kaya, M., Birinci, Ş., Kawash, J., Alhajj, R. (eds) Putting Social Media and Networking Data in Practice for Education, Planning, Prediction and Recommendation. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-33698-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-33698-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33697-4
Online ISBN: 978-3-030-33698-1
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)