Friends Across Time: Multi-Scale Action Segmentation Transformer for Surgical Phase Recognition
CoRR(2024)
摘要
Automatic surgical phase recognition is a core technology for modern
operating rooms and online surgical video assessment platforms. Current
state-of-the-art methods use both spatial and temporal information to tackle
the surgical phase recognition task. Building on this idea, we propose the
Multi-Scale Action Segmentation Transformer (MS-AST) for offline surgical phase
recognition and the Multi-Scale Action Segmentation Causal Transformer
(MS-ASCT) for online surgical phase recognition. We use ResNet50 or
EfficientNetV2-M for spatial feature extraction. Our MS-AST and MS-ASCT can
model temporal information at different scales with multi-scale temporal
self-attention and multi-scale temporal cross-attention, which enhances the
capture of temporal relationships between frames and segments. We demonstrate
that our method can achieve 95.26
for online and offline surgical phase recognition, respectively, which achieves
new state-of-the-art results. Our method can also achieve state-of-the-art
results on non-medical datasets in the video action segmentation domain.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要