Large vocabulary statistical language modeling for continuous speech recognition in finnish

Siivola, Vesa; Kurimo, Mikko; Lagus, Krista

doi:10.21437/Eurospeech.2001-221

Large vocabulary statistical language modeling for continuous speech recognition in finnish

Vesa Siivola, Mikko Kurimo, Krista Lagus

Statistical language modeling (SLM) is an essential part in any large-vocabulary continuous speech recognition (LVCSR) system. The development of the standard SLM methods has been strongly affected by the goals of LVCSR in English. The structure of Finnish is substantially different from English, so if the standard SLMs are directly applied, the success is by no means granted. In this paper we describe our first attempts of building a LVCSR for Finnish and the new SLMs that we have tried. One of our objective has been the indexing and recognition of broadcast news, so special issues of our interest are topic detection, word stemming and modeling words that are poorly covered in the training data. Our new methods are based on neural computing using the self-organizing map (SOM) which has recently been shown to successfully extract and approximate latent semantic structures from massive text collections.

doi: 10.21437/Eurospeech.2001-221

Cite as: Siivola, V., Kurimo, M., Lagus, K. (2001) Large vocabulary statistical language modeling for continuous speech recognition in finnish. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 737-740, doi: 10.21437/Eurospeech.2001-221

@inproceedings{siivola01_eurospeech,
  author={Vesa Siivola and Mikko Kurimo and Krista Lagus},
  title={{Large vocabulary statistical language modeling for continuous speech recognition in finnish}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={737--740},
  doi={10.21437/Eurospeech.2001-221}
}