Translating text into pictographs

VINCENT VANDEGHINSTE; INEKE SCHUURMAN LEEN SEVENS; FRANK VAN EYNDE

doi:10.1017/S135132491500039X

Translating text into pictographs

Published online by Cambridge University Press: 11 November 2015

VINCENT VANDEGHINSTE ,

INEKE SCHUURMAN LEEN SEVENS and

FRANK VAN EYNDE

Show author details

VINCENT VANDEGHINSTE: Affiliation:
Centre for Computational Linguistics, University of Leuven, Blijde Inkomststraat 21 - bus 3315 B-3000, Leuven, Belgium e-mails: vincent@ccl.kuleuven.be, ineke@ccl.kuleuven.be, frank@ccl.kuleuven.be
INEKE SCHUURMAN LEEN SEVENS: Affiliation:
Centre for Computational Linguistics, University of Leuven, Blijde Inkomststraat 21 - bus 3315 B-3000, Leuven, Belgium e-mails: vincent@ccl.kuleuven.be, ineke@ccl.kuleuven.be, frank@ccl.kuleuven.be
FRANK VAN EYNDE: Affiliation:
Centre for Computational Linguistics, University of Leuven, Blijde Inkomststraat 21 - bus 3315 B-3000, Leuven, Belgium e-mails: vincent@ccl.kuleuven.be, ineke@ccl.kuleuven.be, frank@ccl.kuleuven.be

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We describe and evaluate a text-to-pictograph translation system that is used in an online platform for Augmentative and Alternative Communication, which is intended for people who are not able to read and write, but who still want to communicate with the outside world. The system is set up to translate from Dutch into Sclera and Beta, two publicly available pictograph sets consisting of several thousands of pictographs each. We have linked large amounts of these pictographs to synsets or combinations of synsets of Cornetto, a lexical-semantic database for Dutch similar to WordNet. In the translation system, the Dutch input text undergoes shallow linguistic analysis and the synsets of the content words are looked up. The system looks for the nearest pictographs in the lexical-semantic database and displays the message into pictographs. We evaluated the system and results showed a large improvement over the baseline system which consisted of straightforward string-matching between the input text and the filenames of the pictographs.

Our system provides a clear improvement in the communication possibilities of illiterate people. Nevertheless there is room for further improvement.

Type: Articles
Information: Natural Language Engineering , Volume 23 , Issue 2 , March 2017 , pp. 217 - 244

DOI: https://doi.org/10.1017/S135132491500039X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alm, N., Iwabuchi, M., Andreasen, P., and Nakamura, K. 2002. A multi-lingual augmentative communication system. In Univeral Access: Theoretical Perspectives, Practice and Experience, pp. 398–408. Lecture Notes in Computer Science (LNCS), vol. 2615. Berlin: Springer.Google Scholar

Baker, C., Fillmore, C., and Lowe, J. 1998. The Berkeley FrameNet project. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (ACL/CoLing). Association for Compututational Linguistics, Montreal, Quebec, Canada, vol. 1, pp. 86–90.Google Scholar

Behrmann, M., and Byng, S. 1992. A cognitive approach to the neurorehabilitation of acquired language disorders. In Margolin, D. (ed.), Cognitive Neuropsychology in Clinical Practice, pp. 327–50. Oxford, UK: Oxford University Press.Google Scholar

Borman, A., Mihalcea, R., and Tarau, P. 2005. PicNet: augmenting semantic resources with pictorial representations. In Chklovski, T., Domingos, P., Lieberman, H., Mihalcea, R., and Singh, P. (eds.), Technical Report SS-05-03. Proceedings of the AAAI Spring Symposium on Knowledge Collection from Volunteer Contributors, pp. 1–7. Menlo Park, California: The AAAI Press.Google Scholar

Brants, Th. 2000. A statistical part-of-speech tagger. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP). Association for Computational Linguistics, Seattle, Washington, pp. 224–331.Google Scholar

Carney, R., and Levin, J., 2002. Pictorial illustration Still improve students’ learning from text. Educational Psychology Review 14 (1): 5–26.Google Scholar

Coyne, B., and Sproat, R., 2001. WordsEye: an automatic text-to-scene conversion system. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, Association for Computing Machinery (ACM), New York, pp. 487–96.Google Scholar

Davies, D. K., Stock, S. E., and Wehmeyer, M. L. 2001. Enhancing independent internet access for individuals with mental retardation through use of a specialized web browser: a pilot study. Education and Training in Mental Retardation and Developmental Disabilities 36 (1): 107–13.Google Scholar

Dawe, M. 2006. Desperately seeking simplicity: how young adults with cognitive disabilities and their families adopt assistive technologies. In Grinter, R., Rodden, T., Aoki, P., Cutrell, E., Jeffries, R., and Olson, G. (eds.), Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1143–52. New York, U.S.: Association for Computing Machinery (ACM).Google Scholar

Dechter, R., and Pearl, J. 1985. Generalized best-first search strategies and the optimality of A*. Journal of the ACM 32 (3): 505–36.Google Scholar

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L., 2009. ImageNet: a large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers, Miami, FL, pp. 248–55.Google Scholar

Doddington, G. 2002. Automatic evaluation of machine translation quality using N-gram co-occurrence statistics. In Proceedings of the 2nd International Conference on Human Language Technology Research, San Diego, California, pp. 138–45.Google Scholar

Goldberg, A., Zhu, X., Dyer, C. R., Eldawy, N., and Heng, L. 2008. Easy as ABC? Facilitating pictorial communication via semantically enhanced layout. In Proceedings of the 12th Conference on Computational Natural Language Learning (CoNLL), Coling 2008 Organizing Committee, Manchester, England, pp. 119–26.Google Scholar

Halácsy, P., Kornai, A., and Oravecz, C., 2007. HunPos – an open source trigram tagger. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, Association for Computational Linguistics, Prague, Czech Republic, pp. 209–12.Google Scholar

Hart, P. E., Nilsson, N. J., and Raphael, B. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics SSC 4 (2): 100–7.Google Scholar

Joshi, D., Wang, J., and Li, J. 2006. The story picturing engine — a system for automatic text illustration. ACM Transactions on Multimedia Computing, Communications and Applications 2 (1): 1–22.Google Scholar

Keskinen, T., Heimonen, T., Turunen, M., Rajaniemi, J. P., and Kauppinen, S. 2012. SymbolChat: a flexible picture-based communication platform for users with intellectual disabilities. Interacting with Computers, vol. 24(5), pp. 374–86. Oxford, UK: Oxford University Press.Google Scholar

Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Lin, D., and Wu, D. (eds.) Proceedings of 2004 Conference on Empirical Methods on Natural Language Processing (EMNLP 2004), pp. 388–95. Association for Computational Linguistics, Barcelona, Spain: Association for Computational Linguistics.Google Scholar

Medhi, I., Sagar, A., and Toyama, K. 2006. Text-free user interfaces for illiterate and semiliterate users. In International Conference on Information and Communication Technologies and Development (ICTD), pp. 72–82. Berkeley, CA: Institute of Electrical and Electronics Engineers.Google Scholar

Mihalcea, R., and Leong, C. W. 2009. Toward communicating simple sentences using pictorial representations. Machine Translation 22 (3): 153–73.Google Scholar

Miller, G. A. 1995. Wordnet: A lexical database for english. Communications of the ACM 38 (11): 39–41.Google Scholar

Newell, A., and Gregor, P. 2000. ‘User sensitive inclusive design’ – in search of a new paradigm. In Proceedings of the Conference on Universal Usability (CUU’00), Association for Computing Machinery (ACM), Arlington, VA, pp. 39–44.Google Scholar

Oostdijk, N., Goedertier, W., Van Eynde, F., Boves, L., Martens, J. P., Moortgat, M., and Baayen, H. 2002. Experiences from the spoken dutch corpus project. In Rodríguez, M., and Araujo, C. (eds.), Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), pp. 340–7. Las Palmas, Spain: European Language Resources Association.Google Scholar

Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL). Philadelphia, PA, pp. 311–8.Google Scholar

Sevens, L., Vandeghinste, V., and Van Eynde, F. 2014. Improving the precision of synset links between Cornetto and Princeton WordNet. In Proceedings of the COLING Workshop on Lexical and Grammatical Resources for Language Processing (LG-LP 2014), Association for Computational Linguistics and Dublin City University, Dublin, Ireland, pp. 120–6.Google Scholar

Takasaki, T., and Mori, Y. 2007. Design and development of a pictogram communication system for children around the world. In Ishida, T., Fussell, S. R., and Vossen, P. T. J. M. (eds.) Intercultural Collaboration, pp. 193–206. Berlin, Heidelberg: Springer.Google Scholar

Vandeghinste, V. 2002. Lexicon optimization: maximizing lexical coverage in Speech recognition through automated compounding. In Rodríguez, M. and Araujo, C. (eds.), Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), pp. 1270–6. Las Palmas, Spain: European Language Resources Association.Google Scholar

Vandeghinste, V. 2012. Bridging the gap between pictographs and natural language. In Proceedings of the W3C/WAI Research and Development Working Group (RDWG) Online Symposium: Easy-to-Read on the Web. W3C Web Accessibility Initiative. http://www.w3.org/WAI/RD/2012/easy-to-read/paper14/ Google Scholar

Vandeghinste, V., and Schuurman, I. 2014. Linking pictographs to synsets: Sclera2Cornetto. In Calzolari, N., Choukri, K., Declerck, Th., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S. (eds.), Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), pp. 3404–10, Reykjavik, Iceland: European Language Resources Association.Google Scholar

Van den Bosch, A., Busser, G. J., Daelemans, W., and Canisius, S. 2007. An efficient memory-based morphosyntactic tagger and parser for Dutch. In Van Eynde, F., Dirix, P., Schuurman, I., and Vandeghinste, V. (eds.), Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, pp. 99–114, Utrecht: Landelijke Onderzoeksschool Taalkunde.Google Scholar

Van den Bosch, A., Schuurman, I., and Vandeghinste, V. 2006. Transferring PoS-tagging and lemmatization tools from spoken to written Dutch corpus development. In Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., and Tapias, D. (eds.), Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy: European Language Resources Association.Google Scholar

van der Vliet, H., Maks, I., Vossen, P., and Segers, R. 2010. The Cornetto database: Semantic issues in linking lexical units and synsets. In Dijkstra, A., Schoonheim, T. (eds.), Proceedings of the 14th EURALEX 2010 International Congress. pp. 477–83, July 6–10, 2010, Leeuwarden, the Netherlands: Fryske Akademy/De skriuwers.Google Scholar

Van Eynde, F. 2005. Part-of-Speech tagging en lemmatisering van het D-Coi corpus. Centrum voor Computerlinguïstiek. University of Leuven, Belgium. p. 88.Google Scholar

van Noord, G. 2006. At last parsing is now operational. In Mertens, P., Fairon, C., Dister, A., and Watrin, P. (eds.), Verbum Ex Machina. Actes de la 13e conference sur le Traitement Automatique des Langues Naturelles (TALN06), pp. 20–42. Belgium: Presses universitaires de Louvain, Louvain-la-Neuve.Google Scholar

van Noord, G., Bouma, G., Van Eynde, F., de Kok, D., van der Linde, J., Schuurman, I., Tjong Kim Sang, E., and Vandeghinste, V. 2013. Large scale syntactic annotation of written Dutch: Lassy. In Spyns, P., and Odijk, J. (eds.), Essential Speech and Language Technology for Dutch: Resources, Tools and Applications, pp. 147–64. Berlin Heidelberg: Springer.Google Scholar

Vossen, P., Görög, A., Izquierdo, R., and Van den Bosch, A. 2012. DutchSemCor: targeting the ideal sense-tagged corpus. In Calzolari, N., Choukri, K., Declerck, T., Doğan, M., Maegaard, B., Mariani, J., Moreno, A., Odijk, J. and Piperidis, S. (eds.), Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), pp. 584–9, Istanbul, Turkey: European Language Resources Association.Google Scholar

Vossen, P., Maks, I., Segers, R., and van der Vliet, H. 2008. Integrating lexical units, synsets, and ontology in the Cornetto Database. In Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., and Tapias, D. (eds.), Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC’08), pp. 1006–13, Marrakech, Morocco: European Language Resources Association.Google Scholar

Article contents

Translating text into pictographs

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests