0 citations0 references

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

arXiv (Cornell University)2018

Citations Over Time

Riku Arakawa, Sosuke Kobayashi, Yuya Unno, Yuta Tsuboi, Shin‐ichi Maeda

Abstract

Exploration has been one of the greatest challenges in reinforcement learning (RL), which is a large obstacle in the application of RL to robotics. Even with state-of-the-art RL algorithms, building a well-learned agent often requires too many trials, mainly due to the difficulty of matching its actions with rewards in the distant future. A remedy for this is to train an agent with real-time feedback from a human observer who immediately gives rewards for some actions. This study tackles a series of challenges for introducing such a human-in-the-loop RL scheme. The first contribution of this work is our experiments with a precisely modeled human observer: binary, delay, stochasticity, unsustainability, and natural reaction. We also propose an RL method called DQN-TAMER, which efficiently uses both human feedback and distant rewards. We find that DQN-TAMER agents outperform their baselines in Maze and Taxi simulated environments. Furthermore, we demonstrate a real-world human-in-the-loop RL application where a camera automatically recognizes a user's facial expressions as feedback to the agent while the agent explores a maze.

Related Papers

→ Combined inner and outer loop feedback in an intelligent tutoring system for statistics in higher education(2020)20 cited
→ Data Science and Engineering With Human in the Loop, Behind the Loop, and Above the Loop(2023)6 cited
→ The Human in the Infinite Loop: A Case Study on Revealing and Explaining Human-AI Interaction Loop Failures(2022)5 cited
The Basic Condition of a Feedback Loop(2005)
Several Models of the Multi-loop Negative Feedback Circui t and Their Analysis(2001)