[1]刘天亮,莫一鸣,徐高帮,等.多线索非参数化融合的单目视频深度估计[J].东南大学学报(自然科学版),2015,45(5):834-839.[doi:10.3969/j.issn.1001-0505.2015.05.004]
 Liu Tianliang,Mo Yiming,Xu Gaobang,et al.Depth estimation of monocular video using non-parametric fusion of multiple cues[J].Journal of Southeast University (Natural Science Edition),2015,45(5):834-839.[doi:10.3969/j.issn.1001-0505.2015.05.004]
点击复制

多线索非参数化融合的单目视频深度估计()
分享到:

《东南大学学报(自然科学版)》[ISSN:1001-0505/CN:32-1178/N]

卷:
45
期数:
2015年第5期
页码:
834-839
栏目:
计算机科学与工程
出版日期:
2015-09-20

文章信息/Info

Title:
Depth estimation of monocular video using non-parametric fusion of multiple cues
作者:
刘天亮1莫一鸣1徐高帮1戴修斌1朱秀昌1罗杰波2
1南京邮电大学江苏省图像处理与图像通信重点实验室, 南京 210003; 2罗彻斯特大学计算机科学系, 美国罗彻斯特 14627
Author(s):
Liu Tianliang1 Mo Yiming1 Xu Gaobang1 Dai Xiubin1 Zhu Xiuchang1 Luo Jiebo2
1Jiangsu Provincial Key Laboratory of Image Processing and Image Communication, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2Department of Computer Science, University of Rochester, Rochester 14627, USA
关键词:
深度图 非参数化融合 多线索 线性透视 空时相关
Keywords:
depth map non-parametric fusion multiple cues linear perspective spatial-temporal correlation
分类号:
TP391
DOI:
10.3969/j.issn.1001-0505.2015.05.004
摘要:
为解决二维视频的三维转化问题,提出了一种基于非参数化学习和多线索融合的单目视频深度图提取方法.首先,利用单目图像的区域边界轮廓和几何透视结构线索,基于前景背景融合来估计单目视频中各帧的深度图像;然后,利用视频帧间空时相关性,借助非参数学习实现单目视频深度估计;最后,利用全局背景深度分段约束和去抖动来增强深度视频序列.实验结果表明,与其他现有方法相比,该方法能得到更为准确的单目视频深度图序列,无论在主观质量还是均方根误差(RMS)和结构相似性度量(SSIM)上,均能取得较好的效果.
Abstract:
A depth estimation technique for monocular video based on non-parametric learning and fusion of multiple cues is proposed to solve the conversion from two-dimensional(2D)video to three-dimensional(3D). First, according to the regional boundary contours and geometric perspective structures cues of the monocular image, the depth map of each frame of the monocular video is estimated by fusing the related foreground map and the background map. Then, the depth map sequence of the monocular video is estimated in a non-parametric learning framework with temporal and spatial relationship between inter-frames. Finally, the depth sequence is enhanced by exploiting global background depth frame constraint and deblurring. The experimental results show that, compared with the existing methods, the proposed technique can obtain the depth map sequence of monocular video with higher accuracy and better performance not only on subjective quality but also on the root mean square(RMS)and the structural similarity measure(SSIM).

参考文献/References:

[1] Zhang L, Tam W J. Stereoscopic image generation based on depth images for 3D TV [J]. IEEE Transactions on Broadcasting, 2005, 51(2): 191-199.
[2] Mo Y, Liu T, Zhu X, et al. Segment based depth extraction approach for monocular image with linear perspective [C]//4th International Conference on Intelligence Science and Big Data Engineering(IScIDE). Beijing, China, 2013: 168-175.
[3] 刘天亮,戴修斌,朱秀昌,等.基于韦伯感知和导引滤波分层聚合快速立体图像匹配[J].电子与信息学报,2012,34(4):992-996.
  Liu Tianliang, Dai Xiubin, Zhu Xiuchang, et al. Hierarchical aggregation fast stereo image matching using Weber perception and guided filtering [J]. Journal of Electronics and Information Technology, 2012, 34(4): 992-996.(in Chinese)
[4] Nawrot E, Nawrot M. The role of eye movements in depth from motion parallax during infancy [J]. Journal of Vision, 2013, 13(14): 1-13.
[5] Zhang Z, Zhang J, Zhang X, et al. A distributed 2D-to-3D video conversion system [J]. China Communications, 2013, 10(5): 30-38.
[6] Saxena A, Sun M, Ng A Y. Make3D: learning 3D scene structure from a single still image[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(5): 824-840.
[7] Karsch K, Liu C, Kang S B. Depth transfer: depth extraction from video using non-parametric sampling [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 11(36): 2144-2158.
[8] Liu C, Yuen J, Torralba A. SIFT flow: dense correspondence across scenes and its applications[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(5): 978-994.
[9] von Gioi R G, Jakubowicz J, Morel J M, et al. LSD: a fast line segment detector with a false detection control [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(4): 722-732.
[10] Priese L, Schmitt F, Hering N. Grouping of semantically similar image positions [C]//14th Scandinavian Conference on Image Analysis. Oslo, Norway, 2009: 726-734.
[11] Xue W, Xing D, Lin M, et al. Depth-of-field rendering with saliency-based bilateral filtering [C]//2013 International Conference on Computer-Aided Design and Computer Graphics. Guangzhou, 2013: 399-400.
[12] Tamgade S N, Bora V R. Motion vector estimation of video image by pyramidal implementation of Lucas Kanade optical flow [C]//2nd IEEE International Conference on Emerging Trends in Engineering and Technology. Nagpur, India, 2009: 914-917.
[13] Applegate R A, Ballentine C, Gross H, et al. Visual acuity as a function of Zernike mode and level of root mean square error [J]. Optometry and Vision Science, 2003, 80(2): 97-105.
[14] Hore A, Ziou D. Image quality metrics: PSNR vs. SSIM [C]//20th IEEE International Conference on Pattern Recognition. Istanbul, Turkey, 2010: 2366-2369.
[15] Ndjiki-Nya P, Köppel M, Doshkov D, et al. Depth image-based rendering with advanced texture synthesis for 3-D video [J]. IEEE Transactions on Multimedia, 2011, 13(3): 453-465.

备注/Memo

备注/Memo:
收稿日期: 2015-03-17.
作者简介: 刘天亮(1980—),男,博士,副教授,liutl@njupt.edu.cn.
基金项目: 国家自然科学基金青年科学基金资助项目(61001152,31200747)、国家自然科学基金资助项目(61071091,61071166, 61172118)、江苏省自然科学基金资助项目(BK2010523,BK2012437)、南京邮电大学校级科研基金资助项目(NY210069, NY214037)、国家留学基金委资助项目.
引用本文: 刘天亮,莫一鸣,徐高帮,等.多线索非参数化融合的单目视频深度估计[J].东南大学学报:自然科学版,2015,45(5):834-839. [doi:10.3969/j.issn.1001-0505.2015.05.004]
更新日期/Last Update: 2015-09-20