Unsupervised Learning of Depth and Ego-Motion from Video
Citations Over TimeTop 1% of 2017 papers
Abstract
We present an unsupervised learning framework for the task of monocular depth and camera motion estimation from unstructured video sequences. In common with recent work [10, 14, 16], we use an end-to-end learning approach with view synthesis as the supervisory signal. In contrast to the previous work, our method is completely unsupervised, requiring only monocular video sequences for training. Our method uses single-view depth and multiview pose networks, with a loss based on warping nearby views to the target using the computed depth and pose. The networks are thus coupled by the loss during training, but can be applied independently at test time. Empirical evaluation on the KITTI dataset demonstrates the effectiveness of our approach: 1) monocular depth performs comparably with supervised methods that use either ground-truth pose or depth for training, and 2) pose estimation performs favorably compared to established SLAM systems under comparable input settings.
Related Papers
- → Online signature verification using a new extreme points warping technique(2003)130 cited
- Early Abandon to Accelerate Exact Dynamic Time Warping(2009)
- → On the use of dynamic time warping for multicomponent seismic data registration(2013)1 cited
- Similarity Measure for Time Series Based on Incremental Dynamic Time Warping(2013)
- CSDTW:constrained dynamic time warping distance in stream context(2012)