FedSpeech: Federated Text-to-Speech with Continual Learning

Jiang, Ziyue; Ren, Yi; Lei, Ming; Zhao, Zhou

doi:10.24963/ijcai.2021/527

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2110.07216 (eess)

[Submitted on 14 Oct 2021 (v1), last revised 22 May 2023 (this version, v2)]

Title:FedSpeech: Federated Text-to-Speech with Continual Learning

Authors:Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao

View PDF

Abstract:Federated learning enables collaborative training of machine learning models under strict privacy restrictions and federated text-to-speech aims to synthesize natural speech of multiple users with a few audio training samples stored in their devices locally. However, federated text-to-speech faces several challenges: very few training samples from each speaker are available, training samples are all stored in local device of each user, and global model is vulnerable to various attacks. In this paper, we propose a novel federated learning architecture based on continual learning approaches to overcome the difficulties above. Specifically, 1) we use gradual pruning masks to isolate parameters for preserving speakers' tones; 2) we apply selective masks for effectively reusing knowledge from tasks; 3) a private speaker embedding is introduced to keep users' privacy. Experiments on a reduced VCTK dataset demonstrate the effectiveness of FedSpeech: it nearly matches multi-task training in terms of multi-speaker speech quality; moreover, it sufficiently retains the speakers' tones and even outperforms the multi-task training in the speaker similarity experiment.

Comments:	Accepted by IJCAI 2021
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2110.07216 [eess.AS]
	(or arXiv:2110.07216v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2110.07216
Journal reference:	2021. Main Track. Pages 3829-3835
Related DOI:	https://doi.org/10.24963/ijcai.2021/527

Submission history

From: Ziyue Jiang [view email]
[v1] Thu, 14 Oct 2021 08:25:34 UTC (274 KB)
[v2] Mon, 22 May 2023 08:37:10 UTC (326 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:FedSpeech: Federated Text-to-Speech with Continual Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:FedSpeech: Federated Text-to-Speech with Continual Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators