One-Shot Video Object Segmentation
Citations Over TimeTop 10% of 2017 papers
Abstract
This paper tackles the task of semi-supervised video object segmentation, i.e., the separation of an object from the background in a video, given the mask of the first frame. We present One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot). Although all frames are processed independently, the results are temporally coherent and stable. We perform experiments on two annotated video segmentation databases, which show that OSVOS is fast and improves the state of the art by a significant margin (79.8% vs 68.0%).
Related Papers
- → Deep Residual Learning for Image Recognition(2016)216,989 cited
- → Fully convolutional networks for semantic segmentation(2015)36,381 cited
- → A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation(2016)1,988 cited
- → FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos(2017)377 cited
- → Learning Video Object Segmentation with Visual Memory(2017)326 cited