In this paper, we present a novel approach for stereo visual odometry with robust motion estimation which is faster and more accurate than standard RANSAC. Our method makes improvements in RANSAC in three aspects: first, the hypotheses are preferentially generated by sampling the input feature points in the order of ages and similarities of the features; second, the evaluation of hypotheses is performed based on Sequential Probability Ratio Test that makes bad hypotheses discarded very fast without verifying all the data points; third, we aggregate the three best hypotheses to get the final estimation instead of only selecting the best hypothesis. The first two aspects improve the speed of RANSAC by generating good hypotheses and discarding bad hypotheses in advance, respectively.
Accurate stereo matching is still challenging in case of weakly textured areas, discontinuities, and occlusions. Besides, occlusion recovery is often regarded as a subordinate problem and simply handled. To obtain dense high-accuracy depth maps, this letter proposes an efficient multistep disparity refinement framework with occlusion handling. The framework is implemented by classifying the outliers into leftmost occlusions, non-border occlusions, as well as mismatches, and employing different strategies to recover them. To recover occlusions, a filling order is specially introduced to avoid error propagation and surface decision based on local image content is performed when more than one background surface exists. The evaluations on Middlebury datasets and comparisons with other refinement algorithms show the superiority and robustness of our method.
In this paper, a novel approach is proposed for stereo vision-based ground plane detection at superpixel-level, which is implemented by employing a Disparity Texture Map in a convolution neural network architecture. In particular, the Disparity Texture Map is calculated with a new Local Disparity Texture Descriptor (LDTD). The experimental results demonstrate our superior performance in KITTI dataset.
Effective indoor localization is the essential part of VR(Virtual Reality) and AR (Augmented Reality) technologies. Tracking the RGB-D camera becomes more popular since it can capture the relatively accurate color and depth information at the same time. With the recovered colorful point cloud, the traditional ICP (Iterative Closest Point) algorithm can be used to estimate the camera poses and reconstruct the scene. However, many works focus on improving ICP for processing the general scene and ignore the practical significance of effective initialization under the specific conditions, such as the indoor scene for VR or AR.
Stereo matching is a challenging problem with respect to weak texture, discontinuities, illumination difference and occlusions. Therefore, a deep learning framework is presented in this paper, which focuses on the first and last stage of typical stereo methods: the matching cost computation and the disparity refinement. For matching cost computation, two patch-based network architectures are exploited to allow the trade-off between speed and accuracy, both of which leverage multi-size and multi-layer pooling unit with no strides to learn cross-scale feature representations.