Efficient and Stable Sparse to Dense Conversion for Automatic 2D-to-3D Conversion
Various important 3D depth cues such as focus, motion, occlusion and disparity, can only be estimated reliably at distinct sparse image locations like edges and corners. Hence for 2D-to-3D video conversion, a stable and smooth sparse-todense conversion is required to propagate these sparse estimates to the complete video. To this end optimization, segmentation, and triangulation based approaches have been proposed recently. While optimization based approaches produce accurate dense maps, the resulting energy functions are very hard to minimize within the stringent requirements of real-time video processing. In addition segmentation and triangulation based approaches can cause incorrect delineation of object boundaries. Finally, dense maps that are independently estimated from video images suffer from temporal instabilities. To deal with the real-time issue, we propose an innovative low latency, line scanning based sparse-to-dense conversion algorithm with a low computational complexity. To mitigate the stability and smoothness issues, we additionally propose a recursive spatio-temporal post processing and an efficient joint bilateral up-sampling method. We illustrate the performance of the resulting sparse-to-dense converter on dense defocus maps. We also show a subjective assessment of 2D to 3D conversion results using a paired comparison on a variety of challenging low-depth-of-field test sequences. The results demonstrate that the proposed approach achieves equal 3D depth and video quality as state-of-the-art sparse-to-dense converters with a significantly reduced computational complexity and memory usage.