Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models
Citations Over TimeTop 10% of 2023 papers
Abstract
Step 4: Run trained policy on unseen instructions Time … "Lift up the green chip bag from the bowl and drop it at the bottom left corner of the table" * Equal contribution control (DIAL): we utilize semi-supervised language labels to propagate CLIP's semantic knowledge onto large datasets of unlabeled demonstration data, from which we then train languageconditioned policies.This method enables cheaper acquisition of useful language descriptions compared to expensive human labels, allowing for more efficient label coverage of large-scale datasets.We apply DIAL to a challenging real-world robotic manipulation domain where only 3.5% of the 80,000 demonstrations contain crowd-sourced language annotations.Through a large-scale study of over 1,300 real world evaluations, we find that DIAL enables imitation learning policies to acquire new capabilities and generalize to 60 novel instructions unseen in the original dataset.
Related Papers
- → An Object Detection and Pose Estimation Approach for Position Based Visual Servoing(2017)5 cited
- → Self-monitoring to improve robustness of 3D object tracking for robotics(2011)4 cited
- → Object-oriented stripe structured-light vision-guided robot(2017)2 cited
- → 3-D tracking of a moving object by an active stereo vision system(2002)9 cited
- → Robust object tracking based on RGB-D camera(2014)