Textual Alchemy: CoFormer for Scene Text Understanding.

Gayatri Deshmukh, Onkar Susladkar, Dhruv Makwana,Sparsh Mittal, R. Sai Chandra Teja

IEEE/CVF Winter Conference on Applications of Computer Vision（2024）

引用 0|浏览4

暂无评分

摘要

The paper presents CoFormer (Convolutional Fourier Transformer), a robust and adaptable transformer architecture designed for a range of scene text tasks. CoFormer integrates convolution and Fourier operations into the transformer architecture. Thus, it leverages convolution properties such as shared weights, local receptive fields, and spatial subsampling, while the Fourier operation emphasizes composite characteristics from the frequency domain. The research further proposes two new pretraining datasets, named Textverse10M-E and Textverse10M-H. Using these datasets, we demonstrate the efficacy of pretraining for scene text understanding. CoFormer achieves state-of-the-art results with and without pretraining on two downstream tasks: scene text recognition (STR) and scene text editing (STE). The paper further proposes LISTNet (Language Invariant Style Transfer), a novel framework for bi-lingual STE. It also introduces three datasets, viz., TST500K for STE, CSTR2.5M and Akshara550 for STR. The source-code of CoFormer is available at https://github.com/CandleLabAI/CoFormer-WACV-2024.

查看译文

关键词

Applications,Visualization,Algorithms,Generative models for image,video,3D,etc.,Algorithms,Machine learning architectures,formulations,and algorithms

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要