Semi-Automatic Training Set Construction for Supervised Sentiment Analysis in Polarized Contexts

Martin-Gutierrez, S.; Losada, J. C.; Benito, R. M.

doi:10.1007/978-3-030-33698-1_10

Part of the book series: Lecture Notes in Social Networks ((LNSN))

335 Accesses

Abstract

Standard sentiment analysis techniques rely either on sets of rules based on semantic and affective information or in supervised machine learning approaches whose quality heavily depends on the size and significance of a training set of pre-labeled text samples. In many situations, this labeling needs to be performed by hand, potentially limiting the size of the training set. In order to address this issue, in this work we propose a methodology to retrieve text samples from Twitter and automatically label them. We then apply this methodology to several Twitter conversations and assess the quality of the produced training sets. Additionally, we also tackle the situation in which the base rates of positive and negative sentiment samples in the training and test sets are biased with respect to the system in which the classifier is intended to be applied. The results presented in this respect hold relevance beyond this particular application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

W. Medhat, A. Hassan, H. Korashy, Sentiment analysis algorithms and applications: a survey. Ain Shams Eng. J. 5(4), 1093–1113 (2014) [Online]. http://www.sciencedirect.com/science/article/pii/S2090447914000550
Article Google Scholar
P. Yang, Y. Chen, A survey on sentiment analysis by using machine learning methods, in 2017 IEEE 2nd Information, Technology, Networking, Electronic and Automation Control Conference (ITNEC) (IEEE, Piscataway, 2017), pp. 117–121
Google Scholar
S. Wang, C.D. Manning, Baselines and bigrams: simple, good sentiment and topic classification, in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers – Volume 2, ser. ACL ’12 (Association for Computational Linguistics, Stroudsburg, 2012), pp. 90–94 [Online]. http://dl.acm.org/citation.cfm?id=2390665.2390688
V. Bobicev, M. Sokolova, Inter-annotator agreement in sentiment analysis: machine learning perspective, in Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 (2017), pp. 97–102
Google Scholar
A. Pak, P. Paroubek, Twitter as a corpus for sentiment analysis and opinion mining, in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), vol. 10 (2010)
Google Scholar
C. Vania, M. Ibrahim, M. Adriani, Sentiment lexicon generation for an under-resourced language. Int. J. Comput. Linguistics Appl. 5(1), 59–72 (2014)
Google Scholar
A.F. Wicaksono, C. Vania, B. Distiawan, M. Adriani, Automatically building a corpus for sentiment analysis on indonesian tweets, in Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing (2014), pp. 185–194
Google Scholar
S. Martin-Gutierrez, J.C. Losada, R.M. Benito, Semi-automatic training set construction for supervised sentiment analysis in political contexts, in 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (IEEE, Piscataway, 2018), pp. 715–720
Google Scholar
G. Olivares, J.P. Cárdenas, J.C. Losada, J. Borondo, Opinion polarization during a dichotomous electoral process. Complexity 2019, 9 (2019)
Article Google Scholar
M. Hürlimann, B. Davis, K. Cortis, A. Freitas, S. Handschuh, S. Fernández, A Twitter sentiment gold standard for the brexit referendum, in SEMANTiCS 2016 Proceedings of the 12th International Conference on Semantic Systems (2016), pp. 193–196
Google Scholar
M.T. Bastos, D. Mercea, The brexit botnet and user-generated hyperpartisan news. Soc. Sci. Comput. Rev. 37(1), 38–54 (2019)
Article Google Scholar
M.E. Del Valle, R.B. Bravo, Echo chambers in parliamentary Twitter networks: the Catalan case. Int. J. Commun. 12, 21 (2018)
Article Google Scholar
F. Guerrero-Solé, Community detection in political discussions on Twitter: an application of the retweet overlap network method to the Catalan process toward independence. Soc. Sci. Comput. Rev. 35(2), 244–261 (2017)
Article Google Scholar
U. Yaqub, S.A. Chun, V. Atluri, J. Vaidya, Analysis of political discourse on Twitter in the context of the 2016 US presidential elections. Gov. Inf. Q. 34(4), 613–626 (2017)
Article Google Scholar
S.B. Hobolt, T. Leeper, J. Tilley, Divided by the Vote: Affective Polarization in the Wake of Brexit (American Political Science Association, Boston, 2018)
Google Scholar
M. Del Vicario, F. Zollo, G. Caldarelli, A. Scala, W. Quattrociocchi, Mapping social dynamics on facebook: the brexit debate. Soc. Net. 50, 6–16 (2017)
Article Google Scholar
D. Martí, D. Cetrà, The 2015 Catalan election: a de facto referendum on independence? Reg. Fed. Stud. 26(1), 107–119 (2016)
Article Google Scholar
A. Barrio, J. Rodríguez-Teruel, Reducing the gap between leaders and voters? elite polarization, outbidding competition, and the rise of secessionism in catalonia. Ethn. Racial Stud. 40(10), 1776–1794 (2017)
Article Google Scholar
I. Serrano, Just a matter of identity? Support for independence in Catalonia. Reg. Fed. Stud. 23(5), 523–545 (2013)
Article Google Scholar
P. Grover, A.K. Kar, Y.K. Dwivedi, M. Janssen, Polarization and acculturation in US election 2016 outcomes–can Twitter analytics predict changes in voting preferences. Technol. Forecast. Soc. Chang. 145(C), pp. 438–460 (2018)
Google Scholar
D.F. Pacheco, F. Lima-neto, L.G. Moyano, R. Menezes, Football conversations: what Twitter reveals about the 2014 world cup, in Brazilian Workshop on Social Network Analysis and Mining (CSBC 2015-BraSNAM), Recife (2015)
Google Scholar
Z. Liu, I. Weber, Predicting ideological friends and foes in Twitter conflicts, in Proceedings of the 23rd International Conference on World Wide Web (ACM, New York, 2014), pp. 575–576
Google Scholar
“Partido real madrid - fc barcelona en directo,” online, accessed 10-December-2018 [Online]. https://www.laliga.es/directo/temporada-2017-2018/laliga-santander/17/real-madrid_barcelona
Wikipedia, “Anexo: Clubes españoles de fútbol ganadores de competiciones nacionales e internacionales — Wikipedia, the free encyclopedia,” (2018), online, accessed 10-December-2018. [Online]. https://es.wikipedia.org/wiki/Anexo:Clubes_espa%C3%B1oles_de_f%C3%BAtbol_ganadores_de_competiciones_nacionales_e_internacionales
M. Conover, J. Ratkiewicz, M.R. Francisco, B. Gonçalves, F. Menczer, A. Flammini, Political polarization on Twitter. Fifth International AAAI Conference on Weblogs and Social Media, vol. 133, pp. 89–96 (2011)
Google Scholar
L.A. Adamic, N. Glance, The political blogosphere and the 2004 US election: divided they blog, in Proceedings of the 3rd International Workshop on Link Discovery (ACM, New York, 2005), pp. 36–43
Google Scholar
E. Hargittai, J. Gallo, M. Kane, Cross-ideological discussions among conservative and liberal bloggers. Public Choice 134(1–2), 67–86 (2008)
Google Scholar
M.D. Conover, B. Gonçalves, A. Flammini, F. Menczer, Partisan asymmetries in online political activity. EPJ Data Sci. 1(1), 6 (2012)
Google Scholar
A. Morales, J. Borondo, J.C. Losada, R.M. Benito, Measuring political polarization: Twitter shows the two sides of venezuela. Chaos Interdisciplinary J. Nonlinear Sci. 25(3), 033114 (2015)
Article Google Scholar
E.M. Cámara, M.A.G. Cumbreras, J.V. Román, J.G. Morera, Tass 2015 – the evolution of the spanish opinion mining systems. Procesamiento del Lenguaje Natural 56, 33–40 (2016) [Online]. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5284
S. Martin-Gutierrez, J.C. Losada, R.M. Benito, Recurrent patterns of user behavior in different electoral campaigns: a Twitter analysis of the Spanish general elections of 2015 and 2016. Complexity 2018, 2413481 (2018) [Online]. https://doi.org/10.1155/2018/2413481
Article Google Scholar
J. Borondo, A.J. Morales, J.C. Losada, R.M. Benito, Characterizing and modeling an electoral campaign in the context of Twitter: 2011 Spanish presidential election as a case study. Chaos Interdisciplinary J. Nonlinear Sci. 22(2), 023138 (2012) [Online]. http://aip.scitation.org/doi/abs/10.1063/1.4729139
Article Google Scholar
J. Borondo, A. Morales, R. Benito, J. Losada, Multiple leaders on a multilayer social media. Chaos, Solitons Fractals 72, 90–98 (2015)
Article ADS MathSciNet Google Scholar
J. Atserias, B. Casas, E. Comelles, M. González, L. Padró, and M. Padró, Freeling 1.3: syntactic and semantic services in an open-source NLP library, in Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006) (ELRA, Genoa, 2006)
Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
A. Niculescu-Mizil, R. Caruana, Predicting good probabilities with supervised learning, in Proceedings of the 22nd International Conference on Machine Learning, ser. ICML ’05 (ACM, New York, 2005), pp. 625–632. [Online]. http://doi.acm.org/10.1145/1102351.1102430
Walber, “File:precisionrecall.svg,” Last accessed 11-July-2019. [Online]. https://en.wikipedia.org/wiki/File:Precisionrecall.svg
S. Wallis, Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods. J. Quan. Linguist. 20(3), 178–208 (2013)
Article Google Scholar
K.A. Spackman, Signal detection theory: valuable tools for evaluating inductive learning, in Proceedings of the Sixth International Workshop on Machine Learning (Elsevier, Amsterdam, 1989), pp. 160–163
Google Scholar

Download references

Acknowledgements

We would like to thank the TASS organization for allowing us to use their sentiment analysis corpus. This work has been supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under Contract No. MTM2015-63914-P and by the Spanish Ministry of Science, Innovation and Universities (MICIU) Contract No. PGC2018-093854-B-100.

Author information

Authors and Affiliations

Grupo de Sistemas Complejos, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain
S. Martin-Gutierrez, J. C. Losada & R. M. Benito

Authors

S. Martin-Gutierrez
View author publications
You can also search for this author in PubMed Google Scholar
J. C. Losada
View author publications
You can also search for this author in PubMed Google Scholar
R. M. Benito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Martin-Gutierrez .

Editor information

Editors and Affiliations

Computer Engineering, Fırat University, Elazığ, Turkey
Mehmet Kaya
Ministry of Health, Ankara, Turkey
Şuayip Birinci
Department of Computer Science, University of Calgary, Calgary, AB, Canada
Jalal Kawash
Department of Computer Science, University of Calgary, Calgary, AB, Canada
Reda Alhajj

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Martin-Gutierrez, S., Losada, J.C., Benito, R.M. (2020). Semi-Automatic Training Set Construction for Supervised Sentiment Analysis in Polarized Contexts. In: Kaya, M., Birinci, Ş., Kawash, J., Alhajj, R. (eds) Putting Social Media and Networking Data in Practice for Education, Planning, Prediction and Recommendation. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-33698-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-33698-1_10
Published: 28 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33697-4
Online ISBN: 978-3-030-33698-1
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics