Two stream multi-layer convolutional network for keyframe-based video summarization

Multimedia Tools and Applications(2023)

引用 0|浏览9
暂无评分
摘要
In this paper, we propose an unsupervised static video summarization method that extracts keyframes representing the entire video. A two-stream method is presented, that extracts motion and visual features from the video. Features are also considered from different levels of abstraction for the visual stream by performing multi-level feature extraction and fusion. The utilization of features from different layers facilitates better frame representation by focusing on both coarse and fine-grained details of the frames. Neighborhood peak detection and redundancy removal algorithms are then applied to the fused features to produce the final keyframes representing the video summary. The proposed method particularly aims towards the summarization of industrial surveillance videos. Extensive experimentation is performed on both domain-specific as well as domain-independent datasets, to demonstrate the wide applicability of the proposed model. Results of the experimentation on publicly available benchmark datasets namely, OVP and YouTube, show an increase in the F-score as compared to other unsupervised methods. We also report results on a new dataset that we created from the CCTV footage of an industry. The results show that the proposed method outperforms the existing methods by about 10% in terms of the F-score.
更多
查看译文
关键词
multi-layer,keyframe-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要