L1 Identification from L2 Speech Using Neural Spectrogram Analysis

Graham, Calbert

doi:10.21437/Interspeech.2021-1545

L1 Identification from L2 Speech Using Neural Spectrogram Analysis

Calbert Graham

It is well-known that the characteristics of L2 speech are highly influenced by the speakers’ L1. The main objective of this study was to uncover discriminative speech features to identify the L1 background of a speaker from their L2 English speech. Traditional phonetic approaches tend to compare speakers based on a pre-selected set of acoustic features, which may not be sufficient to capture all the unique traces of the L1 in the L2 speech for forensic speaker profiling purposes. Convolutional Neural Network (CNN) has the potential to remedy this issue through the automatic processing of the visual spectrogram.

This paper reports a series of CNN classification experiments modelled on spectrogram images. The classification problem consisted of determining whether English speech samples are spoken by a native speaker of English, Japanese, Dutch, French, or Polish. Both phonetically transcribed and untranscribed speech data were used.

Overall, results showed that the CNN achieved a high level of accuracy in identifying the speakers’ L1s based on spectrogram pictures without explicit phonetic segmentation. However, the results also showed that training the classifiers on certain combinations of phonetically modelled spectrogram images, which would make features more transparent, can produce results with comparable accuracy rates.

doi: 10.21437/Interspeech.2021-1545

Cite as: Graham, C. (2021) L1 Identification from L2 Speech Using Neural Spectrogram Analysis. Proc. Interspeech 2021, 3959-3963, doi: 10.21437/Interspeech.2021-1545

@inproceedings{graham21_interspeech,
  author={Calbert Graham},
  title={{L1 Identification from L2 Speech Using Neural Spectrogram Analysis}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3959--3963},
  doi={10.21437/Interspeech.2021-1545}
}