Enhanced Spatial Feature Learning for Weakly Supervised Object Detection

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS(2024)

引用 10|浏览29
暂无评分
摘要
Weakly supervised object detection (WSOD) has become an effective paradigm, which requires only class labels to train object detectors. However, WSOD detectors are prone to learn highly discriminative features corresponding to local objects rather than complete objects, resulting in imprecise object localization. To address the issue, designing backbones specifically for WSOD is a feasible solution. However, the redesigned backbone generally needs to be pretrained on large-scale ImageNet or trained from scratch, both of which require much more time and computational costs than fine-tuning. In this article, we explore to optimize the backbone without losing the availability of the original pretrained model. Since the pooling layer summarizes neighborhood features, it is crucial to spatial feature learning. In addition, it has no learnable parameters, so its modification will not change the pretrained model. Based on the above analysis, we further propose enhanced spatial feature learning (ESFL) for WSOD, which first takes full advantage of multiple kernels in a single pooling layer to handle multiscale objects and then enhances above-average activations within the rectangular neighborhood to alleviate the problem of ignoring unsalient object parts. The experimental results on the PASCAL VOC and the MS COCO benchmarks demonstrate that ESFL can bring significant performance improvement for the WSOD method and achieve state-of-the-art results.
更多
查看译文
关键词
Proposals,Representation learning,Object detection,Feature extraction,Kernel,Computational modeling,Benchmark testing,Multiple instance learning (MIL),pooling,spatial local feature,weakly supervised object detection (WSOD)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要