Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

Soleymanpour, Mohammad; Ismail, Mahmoud Al; Bahmaninezhad, Fahimeh; Kumar, Kshitiz; Wu, Jian

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2308.06327 (eess)

[Submitted on 11 Aug 2023]

Title:Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

Authors:Mohammad Soleymanpour, Mahmoud Al Ismail, Fahimeh Bahmaninezhad, Kshitiz Kumar, Jian Wu

View PDF

Abstract:We introduce a bilingual solution to support English as secondary locale for most primary locales in hybrid automatic speech recognition (ASR) settings. Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) a fully bilingual alignment model and subsequently bilingual streaming transformer model, (c) a parallel encoder structure with language identification (LID) loss, (d) parallel encoder with an auxiliary loss for monolingual projections. We conclude that in comparison to LID loss, our proposed auxiliary loss is superior in specializing the parallel encoders to respective monolingual locales, and that contributes to stronger bilingual learning. We evaluate our work on large-scale training and test tasks for bilingual Spanish (ES) and bilingual Italian (IT) applications. Our bilingual models demonstrate strong English code-mixing capability. In particular, the bilingual IT model improves the word error rate (WER) for a code-mix IT task from 46.5% to 13.8%, while also achieving a close parity (9.6%) with the monolingual IT model (9.5%) over IT tests.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2308.06327 [eess.AS]
	(or arXiv:2308.06327v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2308.06327

Submission history

From: Mohammad Soleymanpour [view email]
[v1] Fri, 11 Aug 2023 18:06:33 UTC (444 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators