0 citations

LLMs achieve adult human performance on higher-order theory of mind tasks

arXiv (Cornell University)2024

Citations Over Time

Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranès, Benjamin Barnett, Michael McKibben, Tatenda Kanyere, Alison Lentz, Blaise Agüera y Arcas, Robin Dunbar

Abstract

This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.

Related Papers

→ Personality correlates and utilitarian judgments in the everyday context: Psychopathic traits and differential effects of empathy, social dominance orientation, and dehumanization beliefs(2019)19 cited
→ Amount of altruistic punishment accounts for subsequent emotional gratification in participants with primary psychopathy(2011)14 cited
Susquehanna Chorale Spring Concert "Roots and Wings"(2017)
Mediating Role of Unconditional Self-acceptance in Relationship Between College Students’ Perfectionism and Depression(2011)
→ ИСПОЛЬЗОВAНИЕ ПОТЕНЦИAЛA СОЦИAЛЬНЫХ ПAРТНЕРОВ В ПОДГОТОВКЕ БУДУЩИХ ПЕДAГОГОВ(2024)