Increasing Speech Intelligibility by Mimicking Professional Announcers' Voices and Its Physical Correlates

2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC(2023)

引用 0|浏览0
暂无评分
摘要
Previous studies found that speech uttered by professional announcers is more intelligible than that by non-experts in noisy environments. On the basis of this finding, we developed a voice-conversion (VC) system to mimic professional announcers' voices by modifying the speaker embedding of nonexpert speech. The results from our experiments to evaluate this system indicated that intelligibility increased significantly with this system. In this paper, to discuss what physical features correlate to the intelligibility, the following two issues are investigated by analyzing this system: (1) whether speech intelligibility can be changed gradually even by shifting one PCA (principal component analysis) component of the speaker embedding of the above VC system and (2) what physical features are changed when the PCA component is shifted, we retrained the VC system with a larger amount of training data. Comparing the speech intelligibility and candidate features that were changed with the shift of one axis of PCA, we found that spectral tilt, spectral plateau, and cepstral peak prominence are strongly correlated with intelligibility.
更多
查看译文
关键词
spectral tilt,spectral plateau,cepstral peak prominence,PCA,STOI,voice conversion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要