ViTCA-Net: a framework for disease detection in video capsule endoscopy images using a vision transformer and convolutional neural network with a specific attention mechanism

Multimedia Tools and Applications(2024)

引用 0|浏览2
暂无评分
摘要
Video capsule endoscopy (VCE) is a non-invasive procedure to examine the human bowel. The VCE technology generates thousands of images from different parts of the gastrointestinal tract. Since the examination of these images is a tedious and time-consuming task for doctors, automated diagnosis of digestive diseases from VCE images is highly desired. The majority of the existing studies are based on CNN methods, which are not efficient enough in learning invariant global features in VCE images. Therefore, this paper presents a new framework that combines the learning of global and local features from VCE images. The proposed method utilizes a specific attention mechanism within a convolutional neural network to extract local features, while a vision transformer captures global features. Both local and global features are fused for final classification. Extensive experiments were performed on the public Kvasir Capsule Endoscopy dataset, revealing a promising accuracy of 97%. These results not only highlight the model’s capabilities but also demonstrate its favorable standing when compared to the state-of-the-art methods. Additionally, achieving a recall of 85%, the proposed system demonstrated robust generalization capabilities, performing impressively on an unseen dataset.
更多
查看译文
关键词
Vision transformer,CNN,Attention mechanism,Features extraction,Gastrointestinal disease detection,Video capsule endoscopy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要