Spatial-Temporal Multi-level Association for Video Object Segmentation
arxiv(2024)
摘要
Existing semi-supervised video object segmentation methods either focus on
temporal feature matching or spatial-temporal feature modeling. However, they
do not address the issues of sufficient target interaction and efficient
parallel processing simultaneously, thereby constraining the learning of
dynamic, target-aware features. To tackle these limitations, this paper
proposes a spatial-temporal multi-level association framework, which jointly
associates reference frame, test frame, and object features to achieve
sufficient interaction and parallel target ID association with a
spatial-temporal memory bank for efficient video object segmentation.
Specifically, we construct a spatial-temporal multi-level feature association
module to learn better target-aware features, which formulates feature
extraction and interaction as the efficient operations of object
self-attention, reference object enhancement, and test reference correlation.
In addition, we propose a spatial-temporal memory to assist feature association
and temporal ID assignment and correlation. We evaluate the proposed method by
conducting extensive experiments on numerous video object segmentation
datasets, including DAVIS 2016/2017 val, DAVIS 2017 test-dev, and YouTube-VOS
2018/2019 val. The favorable performance against the state-of-the-art methods
demonstrates the effectiveness of our approach. All source code and trained
models will be made publicly available.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要