ISCA Archive Interspeech 2010
ISCA Archive Interspeech 2010

Memory-based active learning for French broadcast news

Frédéric Tantini, Christophe Cerisara, Claire Gardent

Stochastic dependency parsers can achieve very good results when they are trained on large corpora that have been manually annotated. Active learning is a procedure that aims at reducing this annotation cost by selecting as few sentences as possible that will produce the best possible parser. We propose a new selective sampling function for Active Learning that exploits two memory-based distances to find a good compromise between parser uncertainty and sentence representativeness. The reduced dependency between both parsing and selection models opens interesting perspectives for future models combination. The approach is validated on a French broadcast news corpus creation task dedicated to dependency parsing. It outperforms the baseline uncertainty entropy-based selective sampling on this task. We plan to extend this work with self- and co-training methods in order to enlarge this corpus and produce the first French broadcast news Tree Bank.


doi: 10.21437/Interspeech.2010-420

Cite as: Tantini, F., Cerisara, C., Gardent, C. (2010) Memory-based active learning for French broadcast news. Proc. Interspeech 2010, 1377-1380, doi: 10.21437/Interspeech.2010-420

@inproceedings{tantini10_interspeech,
  author={Frédéric Tantini and Christophe Cerisara and Claire Gardent},
  title={{Memory-based active learning for French broadcast news}},
  year=2010,
  booktitle={Proc. Interspeech 2010},
  pages={1377--1380},
  doi={10.21437/Interspeech.2010-420}
}