Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Xue, Jian; Wang, Peidong; Li, Jinyu; Post, Matt; Gaur, Yashesh

Computer Science > Computation and Language

arXiv:2204.05352 (cs)

[Submitted on 11 Apr 2022 (v1), last revised 1 Jul 2022 (this version, v2)]

Title:Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Authors:Jian Xue, Peidong Wang, Jinyu Li, Matt Post, Yashesh Gaur

View PDF

Abstract:Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we introduce it to streaming end-to-end speech translation (ST), which aims to convert audio signals to texts in other languages directly. Compared with cascaded ST that performs ASR followed by text-based machine translation (MT), the proposed Transformer transducer (TT)-based ST model drastically reduces inference latency, exploits speech information, and avoids error propagation from ASR to MT. To improve the modeling capacity, we propose attention pooling for the joint network in TT. In addition, we extend TT-based ST to multilingual ST, which generates texts of multiple languages at the same time. Experimental results on a large-scale 50 thousand (K) hours pseudo-labeled training set show that TT-based ST not only significantly reduces inference time but also outperforms non-streaming cascaded ST for English-German translation.

Comments:	The paper was submitted to Interspeech 2022
Subjects:	Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2204.05352 [cs.CL]
	(or arXiv:2204.05352v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.05352

Submission history

From: Jian Xue [view email]
[v1] Mon, 11 Apr 2022 18:18:53 UTC (844 KB)
[v2] Fri, 1 Jul 2022 20:41:39 UTC (839 KB)

Computer Science > Computation and Language

Title:Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators