A stretch of speech is often consistent with multiple words, e.g., the sequence /hæm/ is consistent with ham but also with the first syllable of hamster, resulting in temporary ambiguity. However, to what degree does this lexical embedding occur? Analyses on two corpora of spoken Dutch showed that 11.9%19.5% of polysyllabic word tokens have word-initial embedding, while 4.1%7.5% of monosyllabic word tokens can appear word-initially embedded. This is much lower than suggested by an analysis of a large dictionary of Dutch. Speech processing thus appears to be simpler than one might expect on the basis of statistics on a dictionary.
Cite as: Scharenborg, O., Okolowski, S. (2009) Lexical embedding in spoken dutch. Proc. Interspeech 2009, 1879-1882, doi: 10.21437/Interspeech.2009-545
@inproceedings{scharenborg09b_interspeech, author={Odette Scharenborg and Stefanie Okolowski}, title={{Lexical embedding in spoken dutch}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={1879--1882}, doi={10.21437/Interspeech.2009-545} }