JAMLIT: A Corpus of Jamaican Standard English for Automatic Speech Recognition of Children’s Speech

Watson, Stefan; Coy, Andre

doi:10.21437/SLTU.2018-51

JAMLIT: A Corpus of Jamaican Standard English for Automatic Speech Recognition of Children’s Speech

Stefan Watson, Andre Coy

Children’s speech is low resource because few corpora exist. Jamaican English (JE) is even lower resource, as there are no existing children or adult corpora, which hinders the automatic recognition of Jamaican children’s speech; data augmentation can overcome this limitation. Typically, augmentation data comes from speakers of the same dialect; however, this is not an option for JE. This work describes JAMLIT, a collection of JE spoken by children; it explores the use of data from related dialects to augment a resource-poor dialect. Augmentation is performed using British (PF-STAR) and American (CMU Kids Speech) English corpora of children's speech. Models created by adding a fraction of the JAMLIT corpus to the PF-STAR corpus improves the recognition of JE, reducing the WER by 58.1% compared to PF-STAR baseline. With CMU, the improvement was 59.6% over baseline. Both augmented models gave WERs within 2.1% of the models trained with Jamaican only data.

doi: 10.21437/SLTU.2018-51

Cite as: Watson, S., Coy, A. (2018) JAMLIT: A Corpus of Jamaican Standard English for Automatic Speech Recognition of Children’s Speech. Proc. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018), 243-247, doi: 10.21437/SLTU.2018-51

@inproceedings{watson18_sltu,
  author={Stefan Watson and Andre Coy},
  title={{JAMLIT: A Corpus of Jamaican Standard English for Automatic Speech Recognition of Children’s Speech}},
  year=2018,
  booktitle={Proc. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)},
  pages={243--247},
  doi={10.21437/SLTU.2018-51}
}