References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer
(Submitted on 30 Apr 2022 (v1), last revised 17 Jan 2023 (this version, v2))
Abstract: Video denoising aims to recover high-quality frames from the noisy video. While most existing approaches adopt convolutional neural networks~(CNNs) to separate the noise from the original visual content, however, CNNs focus on local information and ignore the interactions between long-range regions in the frame. Furthermore, most related works directly take the output after basic spatio-temporal denoising as the final result, leading to neglect the fine-grained denoising process. In this paper, we propose a Dual-stage Spatial-Channel Transformer for coarse-to-fine video denoising, which inherits the advantages of both Transformer and CNNs. Specifically, DSCT is proposed based on a progressive dual-stage architecture, namely a coarse-level and a fine-level stage to extract dynamic features and static features, respectively. At both stages, a Spatial-Channel Encoding Module is designed to model the long-range contextual dependencies at both spatial and channel levels. Meanwhile, we design a Multi-Scale Residual Structure to preserve multiple aspects of information at different stages, which contains a Temporal Features Aggregation Module to summarize the dynamic representation. Extensive experiments on four publicly available datasets demonstrate our proposed method achieves significant improvements compared to the state-of-the-art methods.
Submission history
From: Wulian Yun [view email][v1] Sat, 30 Apr 2022 09:01:21 GMT (7732kb,D)
[v2] Tue, 17 Jan 2023 03:35:30 GMT (6150kb,D)
Link back to: arXiv, form interface, contact.