Skip to main content

A Multi-word Term Extraction System

  • Conference paper
PRICAI 2006: Trends in Artificial Intelligence (PRICAI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4099))

Included in the following conference series:

Abstract

Traditional statistical approaches for identifying multi-word terms have to handle a large amount of noisy data and are extremely time consuming. This paper introduces a multi-word term extraction system for extracting multi-word terms from a set of documents based on the co-related text-segments existing in these documents. The system uses a short predefined stoplist as an initial input to segment a set of documents into text-segments, calculates the segment-weights of all text-segments, and then applies the short text-segments to segment the longer text-segments based on the weight values recursively until all text-segments cannot be further divided. The resultant text-segments can thus be identified as terms based on a specified threshold. The initial experimental result on a set of traditional Chinese documents shows that this system can achieve a minimum of 76.39% of recall rate and a minimum of 91.05% of precision rate on retrieving multiple occurrences terms, which include 18.30% of new identified terms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 239.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chang, J.S., Chen, S.D., Ker, S.J., Chen, Y., Liu, J.: A multiple-Corpus Approach to Recognition of Proper Names in Chinese Texts. Computer Processing of Chinese and Oriental Languages 8(1), 75–85 (1994)

    Google Scholar 

  2. Lai, Y.-S., Wu, C.-H.: Unknown Word and Phrase Extraction Using a Phrase-Like-Unit-Based Likelihood Ratio. International Journal of Computer Processing of Oriental Languages 13(1), 83–95 (2000)

    Article  Google Scholar 

  3. Chinese Stoplist (Traditional). http://www.lc.leidenuniv.nl/awcourse/oracle/text.920/a96518/astopsup.htm#45728

  4. Tsai, C.-H.: A Review of Chinese Word Lists Accessible on the Internet, http://technology.chtsai.org/wordlist/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, J., Yeh, CH., Chau, R. (2006). A Multi-word Term Extraction System. In: Yang, Q., Webb, G. (eds) PRICAI 2006: Trends in Artificial Intelligence. PRICAI 2006. Lecture Notes in Computer Science(), vol 4099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36668-3_153

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-36668-3_153

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36667-6

  • Online ISBN: 978-3-540-36668-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics