Data driven multidialectal phone set for Spanish dialects

Caballero, Monica; Moreno, Asuncion; Nogueiras, Albino

doi:10.21437/Interspeech.2004-309

Data driven multidialectal phone set for Spanish dialects

Monica Caballero, Asuncion Moreno, Albino Nogueiras

This paper addresses the use of a data-driven approach to determine a multidialectal phone set for an automatic speech recognition system for Spanish dialects. This approach is based on a decision tree clustering algorithm that tries to cluster contextual units of different dialects. This procedure avoids the definition of a global phonetic inventory and the previous study of similarity of sounds. The procedure is applied in Spanish as spoken in Spain, Colombia and Venezuela. Results show differences between phonemes that share the same SAMPA symbol in different dialects and also detect similarities between phonemes that are represented by different symbols in dialectal variants. Recognition results using this multidialectal approach overcome the monodialectal ones.

doi: 10.21437/Interspeech.2004-309

Cite as: Caballero, M., Moreno, A., Nogueiras, A. (2004) Data driven multidialectal phone set for Spanish dialects. Proc. Interspeech 2004, 837-840, doi: 10.21437/Interspeech.2004-309

@inproceedings{caballero04_interspeech,
  author={Monica Caballero and Asuncion Moreno and Albino Nogueiras},
  title={{Data driven multidialectal phone set for Spanish dialects}},
  year=2004,
  booktitle={Proc. Interspeech 2004},
  pages={837--840},
  doi={10.21437/Interspeech.2004-309}
}