• DocumentCode
    38543
  • Title

    Efficient Hybrid Tree-Based Stereo Matching With Applications to Postcapture Image Refocusing

  • Author

    Vu, Dung T. ; Chidester, Benjamin ; Hongsheng Yang ; Do, Minh N. ; Jiangbo Lu

  • Author_Institution
    Adv. Digital Sci. Center, Singapore, Singapore
  • Volume
    23
  • Issue
    8
  • fYear
    2014
  • fDate
    Aug. 2014
  • Firstpage
    3428
  • Lastpage
    3442
  • Abstract
    Estimating dense correspondence or depth information from a pair of stereoscopic images is a fundamental problem in computer vision, which finds a range of important applications. Despite intensive past research efforts in this topic, it still remains challenging to recover the depth information both reliably and efficiently, especially when the input images contain weakly textured regions or are captured under uncontrolled, real-life conditions. Striking a desired balance between computational efficiency and estimation quality, a hybrid minimum spanning tree-based stereo matching method is proposed in this paper. Our method performs efficient nonlocal cost aggregation at pixel-level and region-level, and then adaptively fuses the resulting costs together to leverage their respective strength in handling large textureless regions and fine depth discontinuities. Experiments on the standard Middlebury stereo benchmark show that the proposed stereo method outperforms all prior local and nonlocal aggregation-based methods, achieving particularly noticeable improvements for low texture regions. To further demonstrate the effectiveness of the proposed stereo method, also motivated by the increasing desire to generate expressive depth-induced photo effects, this paper is tasked next to address the emerging application of interactive depth-of-field rendering given a real-world stereo image pair. To this end, we propose an accurate thin-lens model for synthetic depth-of-field rendering, which considers the user-stroke placement and camera-specific parameters and performs the pixel-adapted Gaussian blurring in a principled way. Taking ~1.5 s to process a pair of 640×360 images in the off-line step, our system named Scribble2focus allows users to interactively select in-focus regions by simple strokes using the touch screen and returns the synthetically refocused images instantly to the user.
  • Keywords
    aggregation; computer vision; image texture; lenses; rendering (computer graphics); stereo image processing; touch sensitive screens; trees (mathematics); Middlebury stereo benchmark; Scribble2focus; camera-specific parameters; computational efficiency; computer vision; dense correspondence; depth information; estimation quality; expressive depth-induced photo effects; fine depth discontinuities; hybrid minimum spanning tree; interactive depth-of-field rendering; large textureless regions; nonlocal aggregation; nonlocal cost aggregation; pixel-adapted Gaussian blurring; postcapture image refocusing; real-life conditions; stereo matching; stereoscopic images; synthetic depth-of-field rendering; thin-lens model; touch screen; user-stroke placement; weakly textured regions; Cameras; Estimation; Image color analysis; Image edge detection; Image segmentation; Stereo image processing; Stereo matching; cost aggregation; depth estimation; depth of field; post-capture refocusing;
  • fLanguage
    English
  • Journal_Title
    Image Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1057-7149
  • Type

    jour

  • DOI
    10.1109/TIP.2014.2329389
  • Filename
    6826503