Abstract
Peer-to-Peer news portals allow Internet users to write news articles and make them available online to interested readers. Despite the fact that authors are free in their choice of topics, there are a number of quality characteristics that an article must meet before it is published. In addition to meaningful titles, comprehensibly written texts and meaningful images, relevant tags are an important criteria for the quality of such news. In this case study, we discuss the challenges and common mistakes that Peer-to-Peer reporters face when tagging news and how incorrect information can be corrected through the orchestration of existing Natural Language Processing services. Lastly, we use this illustrative example to give insight into the challenges of dealing with bottom-up taxonomies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
ShortNews discontinued operations in 2018.
- 2.
Based on the data available to us. It is very likely that new tags have been added this year, but our dataset does not represent this.
References
Begelman, G., Keller, P., Smadja, F.: Automated tag clustering: improving search and exploration in the tag space. In: Collaborative Web Tagging Workshop at WWW 2006, pp. 15–33, May 2006
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051
Breslin, J.G., Passant, A., Decker, S.: The Social Semantic Web. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01172-6
Bäumer, F.S., Geierhos, M.: Flexible ambiguity resolution and incompleteness detection in requirements descriptions via an indicator-based configuration of text analysis pipelines. In: Proceedings of the 51st Hawaii International Conference on System Sciences, pp. 5746–5755 (2018). https://doi.org/10.24251/HICSS.2018.720
de Castilho, R.E., Gurevych, I.: A broad-coverage collection of portable NLP components for building shareable analysis pipelines. In: Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT, pp. 1–11. ACL and Dublin City University, Dublin (2014). https://doi.org/10.3115/v1/w14-5201
Chuang, S.L., Chien, L.F.: Topic hierarchy generation for text segments: a practical web-based approach. ACM J. 1–33 (2005)
deepset: deepset - open sourcing German BERT (2019). https://deepset.ai/german-bert. Accessed 28 Nov 2019
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2018)
Engesser, S.: Die Qualität des Partizipativen Journalismus im Web: Bausteine für ein integratives theoretisches Konzept und eine explanative empirische Analyse. VS Verlag für Sozialwissenschaften, Wiesbaden (2013)
Ienco, D., Meo, R.: Towards the automatic construction of conceptual taxonomies. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 327–336. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85836-2_31
Karl, H., Kundisch, D., Meyer auf der Heide, F., Wehrheim, H.: A case for a new IT ecosystem: on-the-fly computing. Bus. Inf. Syst. Eng. (2019). https://doi.org/10.1007/s12599-019-00627-x
Kim, W., Choi, B.J., Hong, E.K., Kim, S.K., Lee, D.: A taxonomy of dirty data. Data Min. Knowl. Disc. 7(1), 81–99 (2003). https://doi.org/10.1023/A:1021564703268
Kopp, M., Schönhagen, P.: Die Laien kommen! Wirklich? Eine Untersuchung zum Rollenselbstbild sogenannter Bürgerjournalistinnen und Bürgerjournalisten. In: Quandt, T., Schweiger, W. (eds.) Journalismus Online- Partizipation oder Profession, pp. 79–94. VS Verlag für Sozialwissenschaften, Wiesbaden (2008)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10, 707–710 (1966)
Mathes, A.: Folksonomies - cooperative classification and communication through shared metadata. Computer Mediated Communication, LIS590CMC. University of Illinois Urbana-Champaign, Graduate School of Library and Information Science (2004)
Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012). https://doi.org/10.1016/j.artint.2012.07.001
Neuberger, C.: Wandel der aktuellen Öffentlichkeit im Internet. Ph.D. thesis, Westfälische Wilhelms-Universität, Münster (2004)
Neuberger, C.: Das ende des gatekeeper-zeitalters. In: Lehmann, K., Schetsche, M. (eds.) Die Google-Gesellschaft, Bielefeld, pp. 205–211 (2005)
Tsui, E., Wang, W.M., Cheung, C.F., Lau, A.S.M.: A concept-relationship acquisition and inference approach for hierarchical taxonomy construction from tags. Inf. Process. Manag. 46(1), 44–57 (2010). https://doi.org/10.1016/j.ipm.2009.05.009
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st Conference on Neural Information Processing Systems, pp. 5998–6008. Curran Associates (2017)
Acknowledgements
This work was partially supported by the German Research Foundation (DFG) within the Collaborative Research Centre On-The-Fly Computing (SFB 901).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bäumer, F.S., Kersting, J., Buff, B., Geierhos, M. (2020). Tag Me If You Can: Insights into the Challenges of Supporting Unrestricted P2P News Tagging. In: Lopata, A., Butkienė, R., Gudonienė, D., Sukackė, V. (eds) Information and Software Technologies. ICIST 2020. Communications in Computer and Information Science, vol 1283. Springer, Cham. https://doi.org/10.1007/978-3-030-59506-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-59506-7_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59505-0
Online ISBN: 978-3-030-59506-7
eBook Packages: Computer ScienceComputer Science (R0)