DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback
Citations Over Time
Abstract
Exploration has been one of the greatest challenges in reinforcement learning (RL), which is a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms, building a well-learned agent often requires too many trials, mainly due to the difficulty of matching its actions with rewards in the distant future. A remedy for this is to train an agent with real-time feedback from a human observer who immediately gives rewards for some actions. This study tackles a series of challenges for introducing such a human-in-the-loop RL scheme. The first contribution of this work is our experiments with a precisely modeled human observer: binary, delay, stochasticity, unsustainability, and natural reaction. We also propose an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards. We find that DQN-TAMER agents outperform their baselines in Maze and Taxi simulated environments. Furthermore, we demonstrate a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze.
Related Papers
- → Combined inner and outer loop feedback in an intelligent tutoring system for statistics in higher education(2020)20 cited
- → Data Science and Engineering With Human in the Loop, Behind the Loop, and Above the Loop(2023)6 cited
- → The Human in the Infinite Loop: A Case Study on Revealing and Explaining Human-AI Interaction Loop Failures(2022)5 cited
- The Basic Condition of a Feedback Loop(2005)
- Several Models of the Multi-loop Negative Feedback Circui t and Their Analysis(2001)