Adapting language models for frequent fixed phrases by emphasizing n-gram subsets

Akiba, Tomoyosi; Itou, Katunobu; Fujii, Atsushi

doi:10.21437/Eurospeech.2003-428

Adapting language models for frequent fixed phrases by emphasizing n-gram subsets

Tomoyosi Akiba, Katunobu Itou, Atsushi Fujii

In support of speech-driven question answering, we propose a method to construct N-gram language models for recognizing spoken questions with high accuracy. Question-answering systems receive queries that often consist of two parts: one conveys the query topic and the other is a fixed phrase used in query sentences. A language model constructed by using a target collection of QA, for example, newspaper articles, can model the former part, but cannot model the latter part appropriately. We tackle this problem as task adaptation from language models obtained from background corpora (e.g., newspaper articles) to the fixed phrases, and propose a method that does not use the task-specific corpus, which is often difficult to obtain, but instead uses only manually listed fixed phrases. The method emphasizes a subset of N-grams obtained from a background corpus that corresponds to fixed phrases specified by the list. Theoretically, this method can be regarded as maximizing a posteriori probability (MAP) estimation using the subset of the N-grams as a posteriori distribution. Some experiments show the effectiveness of our method.

doi: 10.21437/Eurospeech.2003-428

Cite as: Akiba, T., Itou, K., Fujii, A. (2003) Adapting language models for frequent fixed phrases by emphasizing n-gram subsets. Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003), 1469-1472, doi: 10.21437/Eurospeech.2003-428

@inproceedings{akiba03_eurospeech,
  author={Tomoyosi Akiba and Katunobu Itou and Atsushi Fujii},
  title={{Adapting language models for frequent fixed phrases by emphasizing n-gram subsets}},
  year=2003,
  booktitle={Proc. 8th European Conference on Speech Communication and Technology (Eurospeech 2003)},
  pages={1469--1472},
  doi={10.21437/Eurospeech.2003-428}
}