nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo searchdiv qikanlogo popupnotification paper paperNew
2024, 03, v.23;No.90 10-22
基于时间特征细化网络的时空视频超分辨率研究
基金项目(Foundation): 中央高校基本科研业务费专项资金项目(CUC2019A002,CUC2019B021)
邮箱(Email): ygzhu@cuc.edu.cn
DOI: 10.12194/j.ntu.20230919001
摘要:

时空视频超分辨率(space-time video super-resolution,STVSR)通过时间和空间2个尺度提升视频的质量,从而实现在视频采集设备、传输或者存储有限的情况下依然能实时地呈现高分辨率和高帧率的视频,满足人们对超高清画质的追求。相比两阶段方法,一阶段方法实现的是特征层面而非像素层面的帧插值,其在推理速度和计算复杂度上都明显更胜一筹。一些现有的一阶段STVSR方法采用基于像素幻觉的特征插值,这幻化了像素,因此很难应对帧间快速运动物体的预测。为此,提出一种基于光流法的金字塔编码器-解码器网络来进行时间特征插值,实现快速的双向光流估计和更真实自然的纹理合成,在使得网络结构更高效的同时弥补了大运动对光流估计带来的不稳定性。另外,空间模块采用基于滑动窗口的局部传播和基于循环网络的双向传播来强化帧对齐,整个网络称为时间特征细化网络(temporal feature refinement netowrk,TFRnet)。为了进一步挖掘TFRnet的潜力,将空间超分辨率先于时间超分辨率(space-first),在几种广泛使用的数据基准和评估指标上的实验证明了所提出方法 TFRnet-sf的出色性能,在总体峰值信噪比(peak signal to noise ratio,PSNR)和结构相似性(structural similarity,SSIM)提升的同时,插入中间帧的PSNR和SSIM也得到提升,在一定程度上缓和了插入的中间帧与原有帧之间PSNR和SSIM差距过大的问题。

Abstract:

Space-time video super-resolution( STVSR) enhances video quality across both temporal and spatial dimensions, enabling real-time presentation of high-resolution and high-frame-rate videos despite limitations in video capture devices, transmission, or storage, thus meeting the demand for ultra-high-definition image quality. Compared to two-stage methods, one-stage approaches achieve frame interpolation at the feature level rather than the pixel level,significantly outperforming in terms of inference speed and computational complexity. Some existing one-stage STVSR methods employ pixel hallucination-based feature interpolation, which struggles to predict fast-moving objects between frames. To address this, a pyramid encoder-decoder network based on optical flow for temporal feature interpolation is proposed, to achieve rapid bidirectional optical flow estimation and more realistic texture synthesis. This network structure, termed temporal feature refinement network(TFRnet), enhances efficiency while mitigating the instability of optical flow estimation for large motions. Additionally, the spatial module incorporates sliding window-based local propagation and bidirectional propagation based on recurrent networks to strengthen frame alignment. To further exploit TFRnet′s potential, spatial super-resolution is prioritized over temporal super-resolution(space-first approach).Experiments on several widely used data benchmarks and evaluation metrics demonstrate the excellent performance of our proposed method, TFRnet-sf. While improving overall peak signal-to-noise ratio(PSNR) and structural similarity index(SSIM), it also enhances PSNR and SSIM for inserted intermediate frames, alleviating to some extent the issue of significant disparities in PSNR and SSIM between inserted intermediate frames and original frames.

参考文献

[1]宋昭漾,赵小强,惠永永,等.基于反向投影的倒N型轻量化网络图像超分辨率重建[J].计算机辅助设计与图形学学报,2022, 34(6):923-932.SONG Z Y, ZHAO X Q, HUI Y Y, et al. Inverted N-type lightweight network based on back projection for image super-resolution reconstruction[J]. Journal of ComputerAided Design&Computer Graphics, 2022, 34(6):923-932.(in Chinese)

[2]金炜,陈莹.多尺度残差通道注意机制下的人脸超分辨率网络[J].计算机辅助设计与图形学学报,2020, 32(6):959-970.JIN W, CHEN Y. Multi-scale residual channel attention network for face super-resolution[J]. Journal of ComputerAided Design&Computer Graphics, 2020, 32(6):959-970.(in Chinese)

[3]LEE H, KIM T, CHUNG T Y, et al. AdaCoF:adaptive collaboration of flows for video frame interpolation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 13-19, 2020,Seattle, WA, USA. New York:IEEE Xplore, 2020:5315-5324.

[4]NIKLAUS S, MAI L, LIU F. Video frame interpolation via adaptive separable convolution[C]//Proceedings of the 2017IEEE International Conference on Computer Vision(ICCV), October 22-29, 2017, Venice, Italy. New York:IEEE Xplore, 2017:261-270.

[5]HUANG Z W, ZHANG T Y, HENG W, et al. Real-time intermediate flow estimation for video frame interpolation[C]//European Conference on Computer Vision. Cham:Springer, 2022:624-642.

[6]KONG L T, JIANG B Y, LUO D H, et al. IFRNet:intermediate feature refine network for efficient frame interpolation[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 18-24, 2022, New Orleans, LA, USA. New York:IEEE Xplore, 2022:1959-1968.

[7]WANG X T, CHAN K C K, YU K, et al. EDVR:video restoration with enhanced deformable convolutional networks[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW), June 16-17, 2019, Long Beach, CA, USA.New York:IEEE Xplore, 2019:1954-1963.

[8]HARIS M, SHAKHNAROVICH G, UKITA N. Recurrent back-projection network for video super-resolution[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 15-20,2019, Long Beach, CA, USA. New York:IEEE Xplore,2019:3892-3901.

[9]XUE T F, CHEN B A, WU J J, et al. Video enhancement with task-oriented flow[J]. International Journal of Computer Vision, 2019, 127(8):1106-1125.

[10]CHAN K C K, WANG X T, YU K, et al. BasicVSR:the search for essential components in video super-resolution and beyond[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 20-25, 2021, Nashville, TN, USA. New York:IEEE Xplore, 2021:4945-4954.

[11]TAO X, GAO H Y, LIAO R J, et al. Detail-revealing deep video super-resolution[C]//Pro ceedings of the 2017IEEE International Conference on Computer Vision(ICCV), October 22-29, 2017, Venice, Italy. New York:IEEE Xplore, 2017:4482-4490.

[12]CABALLERO J, LEDIG C, AITKEN A, et al. Real-time video super-resolution with spatio-temporal networks and motion compensation[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), July 21-26, 2017, Honolulu, HI, USA. New York:IEEE Xplore, 2017:2848-2857.

[13]ISOBE T, ZHU F, JIA X, et al. Revisiting temporal modeling for video super-resolution[EB/OL].(2020-08-13)[2023-08-18]. https://arxiv.org/abs/2008.05765.

[14]ISOBE T, JIA X, GU S H, et al. Video super-resolution with recurrent structure-detail network[C]//Proceedings of the 16th European Conference on Computer Vision, August 23-28, 2020, Glasgow, UK. Cham:Springer, 2020:645-660.

[15]XIANG X Y, TIAN Y P, ZHANG Y L, et al. Zooming slow-Mo:fast and accurate one-stage space-time video super-resolution[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 13-19, 2020, Seattle, WA, USA. New York:IEEE Xplore, 2020:3367-3376.

[16]XU G, XU J, LI Z, et al. Temporal modulation network for controllable space-time video super-resolution[C]//Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 20-25, 2021,Nashville, TN, USA. New York:IEEE Xplore, 2021:6384-6393.

[17]SHI Z H, LIU X H, LI C Q, et al. Learning for unconstrained space-time video super-resolution[J]. IEEE Transactions on Broadcasting, 2022, 68(2):345-358.

[18]MEYER S, WANG O, ZIMMER H, et al. Phase-based frame interpolation for video[C]//Proceedings of the 2015IEEE Conference on Computer Vision and Pattern Recognition(CVPR), June 7-12, 2015, Boston, MA, USA. New York:IEEE Xplore, 2015:1410-1418.

[19]CHENG X H, CHEN Z Z. Multiple video frame interpolation via enhanced deformable separable convolution[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10):7029-7045.

[20]CHOI M, KIM H, HAN B, et al. Channel attention is all you need for video frame interpolation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7):10663-10671.

[21]KIM S Y, OH J, KIM M. FISR:deep joint frame interpolation and super-resolution with a multi-scale temporal loss[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7):11278-11286.

[22]NIKLAUS S, LIU F. Softmax splatting for video frame interpolation[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),June 13-19, 2020, Seattle, WA, USA. New York:IEEE Xplore, 2020:5436-5445.

[23]TIAN Y P, ZHANG Y L, FU Y, et al. TDAN:temporallydeformable alignment network for video super-resolution[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 13-19,2020, Seattle, WA, USA. New York:IEEE Xplore, 2020:3357-3366.

[24]DAI J F, QI H Z, XIONG Y W, et al. Deformable convolutional networks[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision(ICCV), October22-29, 2017, Venic e, Italy. New York:IEEE Xplore,2017:764-773.

[25]CHAN K C K, WANG X T, YU K, et al. Understanding deformable alignment in video super-resolution[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(2):973-981.

[26]SHECHTMAN E, CASPI Y, IRANI M. Space-time superresolution[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(4):531-545.

[27]MUDENAGUDI U, BANERJEE S, KALRA P K. Spacetime super-resolution using graph-cut optimization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(5):995-1008.

[28]张顺,龚怡宏,王进军.深度卷积神经网络的发展及其在计算机视觉领域的应用[J].计算机学报,2019, 42(3):453-482.ZHANG S, GONG Y H, WANG J J. The development of deep convolution neural network and its applications on computer vision[J]. Chinese Journal of Computers, 2019,42(3):453-482.(in Chinese)

[29]HARIS M, SHAKHNAROVICH G, UKITA N. Space-timeaware multi-resolution video enhancement[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 13-19, 2020,Seattle, WA, USA. New York:IEEE Xplore, 2020:2856-2865.

[30]BAO W B, LAI W S, MA C, et al. Depth-aware video frame interpolation[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 15-20, 2019, Long Beach, CA, USA.New York:IEEE Xplore, 2019:3698-3707.

[31]HARIS M, SHAKHNAROVICH G, UKITA N. Deep backprojection networks for single image super-resolution[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), June 18-22,2018, Salt Lake City, UT, USA. New York:IEEE Xplore,2018:1664-1673.

[32]HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the 2016IEEE Conference on Computer Vision and Pattern Recognition(CVPR), June 27-30, 2016, Las Vegas, NV, USA.New York:IEEE Xplore, 2016:770-778.

[33]SHI X J, CHEN Z R, WANG H, et al. Convolutional LSTM network:a machine learning approach for precipitation nowcasting[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems,December 7-12, 2015, Montreal, Canada. New York:ACM, 2015:802-810.

[34]LAI W S, HUANG J B, AHUJA N, et al. Deep Laplacian pyramid networks for fast and accurate super-resolution[C]//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), July 21-26, 2017,Honolulu, HI, USA. New York:IEEE Xplore, 2017:5835-5843.

[35]KINGMA D P, BA J. Adam:a method for stochastic optimization[EB/OL].(2014-12-22)[2023-08-18]. https://arxiv.org/abs/1412.6980.

[36]LOSHCHILOV I, HUTTER F. SGDR:stochastic gradient descent with warm restarts[EB/OL].(2016-08-13)[2023-08-18]. https://arxiv.org/abs/1608.03983.

[37]SU S C, DELBRACIO M, WANG J, et al. Deep video deblurring for hand-held cameras[C]//Proceedings of the2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), July 21-26, 2017, Honolulu, HI,USA. New York:IEEE Xplore, 2017:237-246.

[38]WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment:from error visibility to structural similarity[J]. IEEE Transactions on Image Processing:a Publication of the IEEE Signal Process ing Society, 2004, 13(4):600-612.

基本信息:

DOI:10.12194/j.ntu.20230919001

中图分类号:TP18;TP391.41

引用信息:

[1]姚晓娟,穆柯,潘沛等.基于时间特征细化网络的时空视频超分辨率研究[J].南通大学学报(自然科学版),2024,23(03):10-22.DOI:10.12194/j.ntu.20230919001.

基金信息:

中央高校基本科研业务费专项资金项目(CUC2019A002,CUC2019B021)

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文