SemCKD: Semantic Calibration for Cross-Layer Knowledge Distillation

IEEE Transactions on Knowledge and Data Engineering(2023)

引用 11|浏览24
暂无评分
摘要
Knowledge distillation is a technique to enhance the generalization ability of a student model by exploiting outputs from a teacher model. Recently, feature-map based variants explore knowledge transfer between manually assigned teacher-student pairs in intermediate layers for further improvement. However, layer semantics may vary in different neural networks, resulting in performance degeneration due to negative regularization from semantic mismatch in manual layer associations. To address this issue, we propose semantic calibration for cross-layer knowledge distillation (SemCKD), which automatically assigns proper target layers of the teacher model for each student layer with an attention mechanism. With a learned attention distribution, each student layer distills knowledge contained in multiple teacher layers rather than a specific intermediate layer for appropriate cross-layer supervision. We further provide theoretical analysis of the association weights and conduct extensive experiments to demonstrate the effectiveness of our approach. On average, SemCKD improves the student Top-1 classification accuracy by 4.27% across twelve different teacher-student model combinations on CIFAR-100. Code is available at https://github.com/DefangChen/SemCKD.
更多
查看译文
关键词
Semantics, Predictive models, Computational modeling, Calibration, Training, Mathematical models, Knowledge transfer, Knowledge distillation, semantic calibration, cross-layer distillation, attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要