Co-training 2^L Submodels for Visual Recognition

Hugo Touvron,Matthieu Cord,Maxime Oquab,Piotr Bojanowski,Jakob Verbeek,Herve Jegou

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)（2023）

引用 0|浏览36

暂无评分

摘要

We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, "sub-models", with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the regular loss provided by the one-hot label. Our approach, dubbed "co-sub", uses a single set of weights, and does not involve a pre-trained external model or temporal averaging. Experimentally, we show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation. Our approach is compatible with multiple architectures, including RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their results in comparable settings. For instance, a ViT-B pretrained with cosub on ImageNet-21k obtains 87.4% top-1 acc. @448 on ImageNet-val.

查看译文

关键词

Deep learning architectures and techniques

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要

Co-training 2L Submodels for Visual Recognition

Co-training 2^L Submodels for Visual Recognition