Abstract
A mixed corpus of Portuguese is one in which texts of different origins produce different spelling variants for the same word. A new norm, which will bring together the written texts produced both in Portugal and Brazil, giving then a more uniform orthography, has been effective since 2009, but what happens in the perspective of search, to corpora created before the norm came into practice, or within the transition period? Is the information they contain outdated and worthless? Do they need to be converted to the new norm? In the present work we analyse these questions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Costa, L.: 20th Century Esfinge (Sphinx) Solving the Riddles at CLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 467–476. Springer, Heidelberg (2006), DOI: http://dx.doi.org/10.1007/11878773_52
Amaral, C., Figueira, H., Martins, A., Mendes, A., Mendes, P., Pinto, C.: Priberam’s Question Answering System for Portuguese. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, pp. 410–419. Springer, Heidelberg (2006), DOI: http://dx.doi.org/10.1007/11878773_46
Soares da Silva, A.: Integrando a variação social e métodos quantitativos na investigação sobre linguagem e cognição: para uma sociolinguística cognitiva do português europeu e brasileiro. Revista de Estudos da Linguagem 16, 49–81 (2008), http://relin.letras.ufmg.br/revista/upload/02-Augusto_Soares.pdf
João Almeida, J., Santos, A., Simões, A.: Bigorna – a toolkit for orthography migration challenges. In: Proceedings of the Seventh International Conference on LREC 2010, Valletta, Malta, ELRA, pp. 227–232 (May 2010), http://www.lrec-conf.org/proceedings/lrec2010/pdf/898_Paper.pdf
Diário da República - 1 Série-A: Decreto da Presidência da República 43/91 de 23 de Agosto de 1991 - Ratifica o Acordo Ortográfico da Língua Portuguesa de 1990. Imprensa Nacional, Lisboa (1991), http://dre.pt/pdf1sdip/1991/08/193a00/43704388.PDF
Carvalho, G., de Matos, D.M., Rocio, V.: Document Retrieval for Question Answering: A Quantitative Evaluation of Text Preprocessing. In: Proceedings of PIKM 2007, Lisboa, Portugal, November 5-10, pp. 125–130. ACM (2007) ISBN: 978-1-59593-832-9, DOI: http://dx.doi.org/10.1145/1316874.1316894
Alves, M.A.: Engenharia do Léxico Computacional: princípios, tecnologia e o caso das palavras compostas. Master’s thesis, Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, Lisboa, Portugal (2002)
Lince - Conversor para a nova ortografia: (ILTEC - Instituto de linguística teórica e computacional) (October 20, 2011), http://www.portaldalinguaportuguesa.org/?action=lince&page=main
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carvalho, G., Falé, I., de Matos, D.M., Rocio, V. (2012). Searching a Mixed Corpus in the Light of the New Portuguese Orthographic Norm. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds) Computational Processing of the Portuguese Language. PROPOR 2012. Lecture Notes in Computer Science(), vol 7243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28885-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-28885-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28884-5
Online ISBN: 978-3-642-28885-2
eBook Packages: Computer ScienceComputer Science (R0)