ISCA Archive Odyssey 2022
ISCA Archive Odyssey 2022

Deep Representation Decomposition for Rate-Invariant Speaker Verification

Fuchuan Tong, Siqi Zheng, Haodong Zhou, Xingjia Xie, Qingyang Hong, Lin Li

While promising performance for speaker verification has been achieved by deep speaker embeddings, the advantage would reduce in the case of speaking-style variability. Speaking rate mismatch is often observed in practical speaker verification systems, which may actually degrade the system performance. To reduce intra-class discrepancy caused by speaking rate, we propose a deep representation decomposition approach with adversarial learning to learn speaking rate-invariant speaker embeddings. Specifically, adopting an attention block, we decompose the original embedding into identity-related component and rate-related component through multi-task training. Additionally, to reduce the latent relationship between the two decomposed components, we further propose a cosine mapping block to train the parameters adversarially to minimize the cosine similarity between the two decomposed components. As a result, identity-related features become robust to speaking rate and then are used for verification. Experiments are conducted on VoxCeleb1 data and HI-MIA data to demonstrate the effectiveness of our proposed approach.


doi: 10.21437/Odyssey.2022-32

Cite as: Tong, F., Zheng, S., Zhou, H., Xie, X., Hong, Q., Li, L. (2022) Deep Representation Decomposition for Rate-Invariant Speaker Verification. Proc. The Speaker and Language Recognition Workshop (Odyssey 2022), 228-232, doi: 10.21437/Odyssey.2022-32

@inproceedings{tong22_odyssey,
  author={Fuchuan Tong and Siqi Zheng and Haodong Zhou and Xingjia Xie and Qingyang Hong and Lin Li},
  title={{Deep Representation Decomposition for Rate-Invariant Speaker Verification}},
  year=2022,
  booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2022)},
  pages={228--232},
  doi={10.21437/Odyssey.2022-32}
}