InfMAE: A Foundation Model in Infrared Modality
CoRR(2024)
摘要
In recent years, the foundation models have swept the computer vision field
and facilitated the development of various tasks within different modalities.
However, it remains an open question on how to design an infrared foundation
model. In this paper, we propose InfMAE, a foundation model in infrared
modality. We release an infrared dataset, called Inf30 to address the problem
of lacking large-scale data for self-supervised learning in the infrared vision
community. Besides, we design an information-aware masking strategy, which is
suitable for infrared images. This masking strategy allows for a greater
emphasis on the regions with richer information in infrared images during the
self-supervised learning process, which is conducive to learning the
generalized representation. In addition, we adopt a multi-scale encoder to
enhance the performance of the pre-trained encoders in downstream tasks.
Finally, based on the fact that infrared images do not have a lot of details
and texture information, we design an infrared decoder module, which further
improves the performance of downstream tasks. Extensive experiments show that
our proposed method InfMAE outperforms other supervised methods and
self-supervised learning methods in three downstream tasks. Our code will be
made public at https://github.com/liufangcen/InfMAE.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要