Chatino Speech Corpus Archive Dataset

No Thumbnail Available
Date
2016-10-10
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The data is the result of experiments related to the process of creating speech technologies to document a low-resourced or endangered language. The language that we picked for the creation of speech corpora and training of forced alignment tools is Eastern Chatino, an unwritten and low-resourced language from Oaxaca, Mexico. As far as we can tell, this is the first such resource available under a free Creative Commons license.
Description
This zip file contains WAV-audio files and annotations. The recordings were produced using a digital audio recorder (ZOOM H6) and can be listened to using any sound software that can play WAV-audio files. The annotations can be viewed and edited by the ELAN software packages. ELAN (https://tla.mpi.nl/tools/tla-tools/elan/) is a professional tool for the creation of complex annotations of video and audio resources. Download the dataset using the link below.
Keywords
speech corpus, Chatino, GORILLA, ctp
Citation
DOI
10.5967/P9694F
Link(s) to data and video for this item
Type
Dataset
Collections