YOLOH: You Only Look One Hourglass for Real-Time Object Detection

Shaobo Wang,Renhai Chen,Hongyue Wu, Xiaozhe Li,Zhiyong Feng

IEEE TRANSACTIONS ON IMAGE PROCESSING（2024）

引用 0|浏览10

暂无评分

摘要

Multi-scale detection based on Feature Pyramid Networks (FPN) has been a popular approach in object detection to improve accuracy. However, using multi-layer features in the decoder of FPN methods entails performing many convolution operations on high-resolution feature maps, which consumes significant computational resources. In this paper, we propose a novel perspective for FPN in which we directly use fused single-layer features for regression and classification. Our proposed model, You Only Look One Hourglass (YOLOH), fuses multiple feature maps into one feature map in the encoder. We then use dense connections and dilated residual blocks to expand the receptive field of the fused feature map. This output not only contains information from all the feature maps, but also has a multi-scale receptive field for detection. The experimental results on the COCO dataset demonstrate that YOLOH achieves higher accuracy and better run-time performance than established detector baselines, for instance, it achieves an average precision (AP) of 50.2 on a standard 3x training schedule and achieves 40.3 AP at a speed of 32 FPS on the ResNet-50 model. We anticipate that YOLOH can serve as a reference for researchers to design real-time detection in future studies. Our code is available at https://github.com/wsb853529465/YOLOH-main .

查看译文

关键词

Feature extraction,Convolution,Object detection,MISO communication,Semantics,Computational modeling,Training,One-stage object detection,multiscale fusion,semantic extraction,single-shot network

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要