Publication:
A New Dataset and Transformer for Stereoscopic Video Super-Resolution

dc.contributor.authorImani, Hassan
dc.contributor.authorIslam, Md Baharul
dc.contributor.authorLai-Kuan, Wong
dc.contributor.institutionImani, Hassan, Bahçeşehir Üniversitesi, Istanbul, Turkey
dc.contributor.institutionIslam, Md Baharul, Bahçeşehir Üniversitesi, Istanbul, Turkey, American University of Malta, Cospicua, Malta
dc.contributor.institutionLai-Kuan, Wong, Multimedia University, Cyberjaya, Malaysia
dc.date.accessioned2025-10-05T15:22:59Z
dc.date.issued2022
dc.description.abstractStereo video super-resolution (SVSR) aims to enhance the spatial resolution of the low-resolution video by reconstructing the high-resolution video. The key challenges in SVSR are preserving the stereo-consistency and temporal-consistency, without which viewers may experience 3D fatigue. There are several notable works on stereoscopic image super-resolution, but there is little research on stereo video super-resolution. In this paper, we propose a novel Transformer-based model for SVSR, namely Trans-SVSR. Trans-SVSR comprises two key novel components: a spatio-temporal convolutional self-attention layer and an optical flow-based feed-forward layer that discovers the correlation across different video frames and aligns the features. The parallax attention mechanism (PAM) that uses the cross-view information to consider the significant disparities is used to fuse the stereo views. Due to the lack of a benchmark dataset suitable for the SVSR task, we collected a new stereoscopic video dataset, SVSR-Set, containing 71 full high-definition (HD) stereo videos captured using a professional stereo camera. Extensive experiments on the collected dataset, along with two other datasets, demonstrate that the Trans-SVSR can achieve competitive performance compared to the state-of-the-art methods. Project code and additional results are available at https://github.com/H-deep/Trans-SVSR/. © 2025 Elsevier B.V., All rights reserved.
dc.identifier.conferenceName2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
dc.identifier.conferencePlaceNew Orleans, LA
dc.identifier.doi10.1109/CVPRW56347.2022.00086
dc.identifier.endpage714
dc.identifier.issn21607516
dc.identifier.issn21607508
dc.identifier.scopus2-s2.0-85137802424
dc.identifier.startpage705
dc.identifier.urihttps://doi.org/10.1109/CVPRW56347.2022.00086
dc.identifier.urihttps://hdl.handle.net/20.500.14719/9085
dc.identifier.volume2022-June
dc.language.isoen
dc.publisherIEEE Computer Society
dc.relation.sourceIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
dc.subject.authorkeywordsComputer Vision
dc.subject.authorkeywordsDigital Television
dc.subject.authorkeywordsGeometrical Optics
dc.subject.authorkeywordsOptical Correlation
dc.subject.authorkeywordsStereo Image Processing
dc.subject.authorkeywordsHigh Resolution
dc.subject.authorkeywordsImage Super Resolutions
dc.subject.authorkeywordsLow Resolution Video
dc.subject.authorkeywordsResolution Video
dc.subject.authorkeywordsSpatial Resolution
dc.subject.authorkeywordsStereo Video
dc.subject.authorkeywordsStereoscopic Image
dc.subject.authorkeywordsStereoscopic Video
dc.subject.authorkeywordsTemporal Consistency
dc.subject.authorkeywordsVideo Super-resolution
dc.subject.authorkeywordsOptical Resolving Power
dc.subject.indexkeywordsComputer vision
dc.subject.indexkeywordsDigital television
dc.subject.indexkeywordsGeometrical optics
dc.subject.indexkeywordsOptical correlation
dc.subject.indexkeywordsStereo image processing
dc.subject.indexkeywordsHigh resolution
dc.subject.indexkeywordsImage super resolutions
dc.subject.indexkeywordsLow resolution video
dc.subject.indexkeywordsResolution video
dc.subject.indexkeywordsSpatial resolution
dc.subject.indexkeywordsStereo video
dc.subject.indexkeywordsStereoscopic image
dc.subject.indexkeywordsStereoscopic video
dc.subject.indexkeywordsTemporal consistency
dc.subject.indexkeywordsVideo super-resolution
dc.subject.indexkeywordsOptical resolving power
dc.titleA New Dataset and Transformer for Stereoscopic Video Super-Resolution
dc.typeConference Paper
dcterms.referencesIEEE Transactions on Image Processing, (2019), Bhavsar, Arnav V., Resolution enhancement for binocular stereo, Proceedings - International Conference on Pattern Recognition, (2008), Bhavsar, Arnav V., Resolution enhancement in multi-image stereo, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 9, pp. 1721-1728, (2010), Video Super Resolution Transformer, (2021), Chan, Kelvin C.K., BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 4945-4954, (2021), Structured Model Pruning of Convolutional Networks on Tensor Processing Units, (2021), Cheng, Eva C., RMIT3DV: Pre-announcement of a creative commons uncompressed HD 3D video database, pp. 212-217, (2012), Eccv, (2016), Dan, Jiawang, A Disparity Feature Alignment Module for Stereo Image Super-Resolution, IEEE Signal Processing Letters, 28, pp. 1285-1289, (2021), An Image is Worth 16x16 Words Transformers for Image Recognition at Scale, (2020)
dspace.entity.typePublication
local.indexed.atScopus
person.identifier.scopus-author-id54796733900
person.identifier.scopus-author-id57204631897
person.identifier.scopus-author-id57210368825

Files