Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING（2024）

引用 1|浏览16

暂无评分

摘要

The automatic speaker verification task has achieved great success using deep learning approaches with a large-scale, manually annotated dataset. However, collecting a significant amount of well-labeled data for system building is very difficult and expensive. Recently, self-supervised speaker verification has attracted a lot of interest due to its no dependency on labeled data. In this article, we propose a novel and advanced self-supervised learning framework based on our prior work, which can construct a powerful speaker verification system with high performance without using any labeled data. To avoid the impact of false negative pairs, we adopt the self-distillation with no labels (DINO) framework as the initial model, which can be trained without exploiting negative pairs. Then, we further introduce a cluster-aware training strategy for DINO to improve the diversity of data. In the iterative learning stage, due to a mass of unreliable labels from unsupervised clustering, the quality of pseudo labels is important for the system performance. This motivates us to propose dynamic loss-gate and label correction (DLG-LC) methods to alleviate the performance degradation caused by unreliable labels. Furthermore, we extend the DLG-LC from single-modality to multi-modality on the audio-visual dataset to further improve the performance. The experiments were conducted using the widely-used Voxceleb dataset. Compared to the best-known self-supervised speaker verification system, our proposed method achieve relative EER improvement of 22.17%, 27.94% and 25.56% on Vox-O, Vox-E and Vox-H test sets, even with fewer iterations, smaller models, and simpler clustering methods. Importantly, the newly proposed self-supervised learning system even achieves comparable results with the fully supervised system, but without using any human-labeled data.

查看译文

关键词

Self-supervised speaker verification,cluster-aware dino,dynamic loss-gate,label correction,multi-modality

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要