Light-field technology heralds one of the biggest changes to imaging since 1826, when Joseph-Nicéphore Niépce made the first permanent photograph of a scene from nature. A single light-field snapshot can provide photos where focus, exposure, and even depth of field are adjustable after the picture is taken. Light-field imaging is far more ambitious. Instead of merely recording the sum of all the light rays falling on each photosite, a light-field camera aims to measure the intensity and direction of every incoming ray. With that information, you can generate not just one but every possible image of whatever is within the camera’s field of view at that moment. The information a light-field camera records is, mathematically speaking, part of something that optics specialists call the plenoptic function. This function describes the totality of light rays filling a given region of space at any one moment. It’s a function of five dimensions, because you need three (x, y, and z) to specify the position of each vantage point, plus two more (often denoted θ and φ) for the angle of every incoming ray.
Accurate depth estimation is one of key problems in 3D scene reconstruction and visualization, which can be expanded to many computer vision applications, such as object tracking, scene segmentation, visual navigation and so on. At present, single depth cue based depth estimation is still an open problem in computer vision. To utilize multiple depth cues in a dense of camera array, we explore on the three key aspects, including acquisition of multiple depth cues, multiple cues based depth estimation and accurate depth map optimization. In order to extract multiple depth cues from light field, we have built a camera array system to capture target scene. The elemental cameras are accurately calibrated, which can be used for synthetic imaging with a synthetic 2D or 3D focal plane. Then we have extracted scene depth information related structure cues, parallax cue and focus cue of target scene from light field EPI, refocusing image and confocal image respectively. After that, we have proposed that there is complementary relationship between parallax cue and focus cue. For depth estimation, we have introduced a novel ground control points (GCPs) based method to obtain dense disparity map. Moreover, by focusing on the parallax cue, we have proposed a segmentation-tree based cost aggregation to produce more robust disparity estimation for each pixel. Besides, we have also proposed a multi-occlusion model in light field, which can be performed to deal with the occlusion area in depth estimation. Finally, based on the light field sampling analysis, we have proposed a multi-depth cues fusion algorithm to estimate depth under the framework of Markov Random Field, which can take both advantages of shape from stereo and shape from focus. Our algorithm is more accurate than single cue based depth estimation algorithms. To optimize the result of depth estimation, we have first proposed a method to remove outliers based on penalized linear regression, which can eliminate the distraction of outliers. As for the estimation of occluded area, we have proposed a global optimization based on the surface camera and stereo matching method, which can achieve sub-pixel accuracy for depth estimation. To address the issue of aliasing artifacts in the light field imaging, we have proposed an angular aliasing detection algorithm by shifting the aperture model randomly, and then we introduce a multi-scale anti-aliasing rendering algorithm to stitch different non-aliasing image parts together. Our algorithm can significantly improve the confocal imaging quality. We also carry out several researches on other depth estimation related techniques and applications, such as multi-view video synchronization, light field super-pixel segmentation, local feature extraction of light field and applications in live face detection and so on.
After four year work, we have submitted 4 patent applications in China and published 20 papers, including 2 papers on TIP and TCSVT journals and 2 papers on CCF Rank A conferences ICCV and CVPR. We have also cultivated 2 NSFC young scholar funds, 5 Ph.D. and 10 master students under the support of this NSFC fund.
Key words: Depth estimation; Camera array; Multiple depth cues; Global optimization; Depth evaluation model
Depth Estimation from Light Field Analysis Based Multiple Cues Fusion
Degang Yang, Zhaolin Xiao, Heng Yang, Qing Wang
计算机学报 (Chinese Journal of Computers), 38(12):2437-2449
Paper | Code | BibTeX | Github
"The man can be destroyed but not defeated。" - Ernest Miller Hemingway