Publication: A New Dataset and Transformer for Stereoscopic Video Super-Resolution
| dc.contributor.author | Imani, Hassan | |
| dc.contributor.author | Islam, Md Baharul | |
| dc.contributor.author | Lai-Kuan, Wong | |
| dc.contributor.institution | Imani, Hassan, Bahçeşehir Üniversitesi, Istanbul, Turkey | |
| dc.contributor.institution | Islam, Md Baharul, Bahçeşehir Üniversitesi, Istanbul, Turkey, American University of Malta, Cospicua, Malta | |
| dc.contributor.institution | Lai-Kuan, Wong, Multimedia University, Cyberjaya, Malaysia | |
| dc.date.accessioned | 2025-10-05T15:22:59Z | |
| dc.date.issued | 2022 | |
| dc.description.abstract | Stereo video super-resolution (SVSR) aims to enhance the spatial resolution of the low-resolution video by reconstructing the high-resolution video. The key challenges in SVSR are preserving the stereo-consistency and temporal-consistency, without which viewers may experience 3D fatigue. There are several notable works on stereoscopic image super-resolution, but there is little research on stereo video super-resolution. In this paper, we propose a novel Transformer-based model for SVSR, namely Trans-SVSR. Trans-SVSR comprises two key novel components: a spatio-temporal convolutional self-attention layer and an optical flow-based feed-forward layer that discovers the correlation across different video frames and aligns the features. The parallax attention mechanism (PAM) that uses the cross-view information to consider the significant disparities is used to fuse the stereo views. Due to the lack of a benchmark dataset suitable for the SVSR task, we collected a new stereoscopic video dataset, SVSR-Set, containing 71 full high-definition (HD) stereo videos captured using a professional stereo camera. Extensive experiments on the collected dataset, along with two other datasets, demonstrate that the Trans-SVSR can achieve competitive performance compared to the state-of-the-art methods. Project code and additional results are available at https://github.com/H-deep/Trans-SVSR/. © 2025 Elsevier B.V., All rights reserved. | |
| dc.identifier.conferenceName | 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022 | |
| dc.identifier.conferencePlace | New Orleans, LA | |
| dc.identifier.doi | 10.1109/CVPRW56347.2022.00086 | |
| dc.identifier.endpage | 714 | |
| dc.identifier.issn | 21607516 | |
| dc.identifier.issn | 21607508 | |
| dc.identifier.scopus | 2-s2.0-85137802424 | |
| dc.identifier.startpage | 705 | |
| dc.identifier.uri | https://doi.org/10.1109/CVPRW56347.2022.00086 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14719/9085 | |
| dc.identifier.volume | 2022-June | |
| dc.language.iso | en | |
| dc.publisher | IEEE Computer Society | |
| dc.relation.source | IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops | |
| dc.subject.authorkeywords | Computer Vision | |
| dc.subject.authorkeywords | Digital Television | |
| dc.subject.authorkeywords | Geometrical Optics | |
| dc.subject.authorkeywords | Optical Correlation | |
| dc.subject.authorkeywords | Stereo Image Processing | |
| dc.subject.authorkeywords | High Resolution | |
| dc.subject.authorkeywords | Image Super Resolutions | |
| dc.subject.authorkeywords | Low Resolution Video | |
| dc.subject.authorkeywords | Resolution Video | |
| dc.subject.authorkeywords | Spatial Resolution | |
| dc.subject.authorkeywords | Stereo Video | |
| dc.subject.authorkeywords | Stereoscopic Image | |
| dc.subject.authorkeywords | Stereoscopic Video | |
| dc.subject.authorkeywords | Temporal Consistency | |
| dc.subject.authorkeywords | Video Super-resolution | |
| dc.subject.authorkeywords | Optical Resolving Power | |
| dc.subject.indexkeywords | Computer vision | |
| dc.subject.indexkeywords | Digital television | |
| dc.subject.indexkeywords | Geometrical optics | |
| dc.subject.indexkeywords | Optical correlation | |
| dc.subject.indexkeywords | Stereo image processing | |
| dc.subject.indexkeywords | High resolution | |
| dc.subject.indexkeywords | Image super resolutions | |
| dc.subject.indexkeywords | Low resolution video | |
| dc.subject.indexkeywords | Resolution video | |
| dc.subject.indexkeywords | Spatial resolution | |
| dc.subject.indexkeywords | Stereo video | |
| dc.subject.indexkeywords | Stereoscopic image | |
| dc.subject.indexkeywords | Stereoscopic video | |
| dc.subject.indexkeywords | Temporal consistency | |
| dc.subject.indexkeywords | Video super-resolution | |
| dc.subject.indexkeywords | Optical resolving power | |
| dc.title | A New Dataset and Transformer for Stereoscopic Video Super-Resolution | |
| dc.type | Conference Paper | |
| dcterms.references | IEEE Transactions on Image Processing, (2019), Bhavsar, Arnav V., Resolution enhancement for binocular stereo, Proceedings - International Conference on Pattern Recognition, (2008), Bhavsar, Arnav V., Resolution enhancement in multi-image stereo, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 9, pp. 1721-1728, (2010), Video Super Resolution Transformer, (2021), Chan, Kelvin C.K., BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4945-4954, (2021), Structured Model Pruning of Convolutional Networks on Tensor Processing Units, (2021), Cheng, Eva C., RMIT3DV: Pre-announcement of a creative commons uncompressed HD 3D video database, pp. 212-217, (2012), Eccv, (2016), Dan, Jiawang, A Disparity Feature Alignment Module for Stereo Image Super-Resolution, IEEE Signal Processing Letters, 28, pp. 1285-1289, (2021), An Image is Worth 16x16 Words Transformers for Image Recognition at Scale, (2020) | |
| dspace.entity.type | Publication | |
| local.indexed.at | Scopus | |
| person.identifier.scopus-author-id | 54796733900 | |
| person.identifier.scopus-author-id | 57204631897 | |
| person.identifier.scopus-author-id | 57210368825 |
