Masked Token Similarity Transfer for Compressing Transformer-Based ASR Models

Euntae Choi,Youshin Lim,Byeong-Yeol Kim,Hyung Yong Kim,Hanbin Lee, Yunkyu Lim, Seung Woo Yu,Sungjoo Yoo

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览15
暂无评分
摘要
Recent self-supervised automatic speech recognition (ASR) models based on transformers are showing best performance, but their footprint is too large to be trained on low-resource environments or deployed to edge devices. Knowledge distillation (KD) can be employed to reduce the model size. However, setting embedding dimension of teacher and student network to different values makes it difficult to transfer token embeddings for better performance. To mitigate this issue, we present a novel KD method in which student mimics the prediction vector of teacher under our proposed masked token similarity transfer (MTST) loss where the temporal relation between a token and the other unmasked ones is encoded into a dimension-agnostic token similarity vector. Under our transfer learning setting with a fine-tuned teacher, our proposed methods reduce the model size of student to 28.3% of teacher’s while word error rate on test-clean subset in LibriSpeech corpus is 4.93%, which surpasses prior works. Our source code will be made available.
更多
查看译文
关键词
Automatic Speech Recognition,Transformer,Knowledge Distillation,Model Compression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要