Improvisation through Physical Understanding: Using Novel Objects As Tools with Visual Foresight
Citations Over TimeTop 10% of 2019 papers
Abstract
Machine learning has enabled robots to perform complex tasks in narrowly-scoped settings, and to perform simple tasks with high generalization. However, learning a model that can both perform complex tasks and generalize to previously unseen objects and goals remains a significant challenge. We study this challenge in the context of "improvisational" tool use: a robot is presented with novel objects and a user-specified goal (e.g., sweep some clutter into the dustpan), and must figure out, using only raw image observations, how to accomplish the goal using the available objects as tools. We approach this problem by training a model with both a visual and physical understanding of multi-object interactions, and develop a sampling-based optimizer that can leverage these interactions to accomplish tasks. We do so by combining diverse demonstration data with selfsupervised interaction data, aiming to leverage the interaction data to build generalizable models and the demonstration data to guide the model-based RL planner to solve complex tasks. Our experiments show that our approach can solve a variety of complex tool use tasks from raw pixel inputs, outperforming both imitation learning and self-supervised learning individually. Furthermore, we show that the robot can perceive and use novel objects as tools, including objects that are not conventional tools, while also choosing dynamically to use or not use tools depending on whether or not they are required. Videos of the results are available online 1 .
Related Papers
- → Improvisation as “real time foresight”(2011)60 cited
- → Shared mental models in improvisational performance(2010)37 cited
- → Interactive Performance: Dramatic Improvisation in a Mixed Reality Environment for Learning(2011)10 cited
- → RETRACTED ARTICLE: Multimodal interface interaction design model based on dynamic augmented reality(2018)12 cited
- → A knowledge-based user interface for the interactive design of three-dimensional objects(1989)1 cited