Skip to main content

Semantic Decomposition of Character Encodings for Linguistic Knowledge Discovery

  • Conference paper
From Data and Information Analysis to Knowledge Engineering

Abstract

Analysis and knowledge representation of linguistic objects tends to focus on larger units (e.g. words) than print medium characters. We analyse characters as linguistic objects in their own right, with meaning, structure and form. Characters have meaning (the symbols of the International Phonetic Alphabet denote phonetic categories, the character represented by the glyph ‘∪’ denotes set union), structure (they are composed of stems and parts such as descenders or diacritics or are ligatures), and form (they have a mapping to visual glyphs). Character encoding initatives such as Unicode tend to concentrate on the structure and form of characters and ignore their meaning in the sense discussed here. We suggest that our approach of including semantic decomposition and defining font-based namespaces for semantic character domains provides a long-term perspective of interoperability and tractability with regard to data-mining over characters by integrating information about characters into a coherent semiotically-based ontology. We demonstrate these principles in a case study of the International Phonetic Alphabet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 159.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • DAVIS, M and SCHERER, M. (2004): Character Mapping Markup Language (CharMapML). Unicode Technical Report #22, Unicode Consortium. http://www.unicode.org/reports/tr22/

    Google Scholar 

  • DÜRST, M., YERGEAU, F., ISHIDA, R., WOLF, M. and TEXIN, T. (2005): Character M odel for the World Wide Web 1.0: Fundamentals. World Wide Web Consortium. http://www.w3.org/TR/charmod/

    Google Scholar 

  • ESLING, J. H. and GAYLORD, H. 1993. Computer Codes for Phonetic Symbols. Journal of the International Phonetic Association 23(2), pp. 83–97.

    Google Scholar 

  • GIBBON, D., BOW, C., BIRD, S. and HUGHES, B. (2004): Securing Interpretability: The Case of Ega Language Documentation. Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, 2004. Euopean Language Resources Association: Paris. pp 1369–1372.

    Google Scholar 

  • GIBBON, D., MERTINS, I., MOORE, R. (2000): Handbook of Multimodal and Spoken Language Systems: Resources, Terminology and Product Evaluation. New York etc.: Kluwer Academic Publishers.

    Google Scholar 

  • HIMMELMANN, N. P. (1998): Documentary and descriptive linguistics. Linguistics 36, pp.161–195.

    Google Scholar 

  • INTERNATIONAL PHONETIC ASSOCATION (1999): Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge University Press: Cambridge. http://www2.gla.ac.uk/IPA/

    Google Scholar 

  • PULLUM, G. K. and LADUSAW, W. A. (1986): Phonetic Symbol Guide. The University of Chicago Press: Chicago.

    Google Scholar 

  • UNICODE CONSORTIUM, (2003): The Unicode Standard, Version 4.0, Reading, MA, Addison-Wesley, 2003. http://www.unicode.org/versions/Unicode4.0.0/

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Berlin · Heidelberg

About this paper

Cite this paper

Gibbon, D., Hughes, B., Trippel, T. (2006). Semantic Decomposition of Character Encodings for Linguistic Knowledge Discovery. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_44

Download citation

Publish with us

Policies and ethics