Benchmarks for Automated Commonsense Reasoning: A Survey
Citations Over TimeTop 10% of 2023 papers
Abstract
More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed, and many aspects of common sense remain untested. Consequently, there is currently no reliable way of measuring to what extent existing AI systems have achieved these abilities. This article surveys the development and uses of AI commonsense benchmarks. It enumerates 139 commonsense benchmarks that have been developed: 102 text-based, 18 image-based, 12 video-based, and 7 based in simulated physical environments. It gives more detailed descriptions of twelve of these, three from each category. It surveys the various methods used to construct commonsense benchmarks. It discusses the nature of common sense, the role of common sense in AI, the goals served by constructing commonsense benchmarks, desirable features of commonsense benchmarks, and flaws and gap in existing benchmarks. It concludes with a number of recommendations for future development of commonsense AI benchmarks; most importantly, that the creators of benchmarks invest the work needed to ensure that benchmark examples are consistently high quality.
Related Papers
- → Beating Common Sense into Interactive Applications(2004)148 cited
- → Benchmarks for Automated Commonsense Reasoning: A Survey(2023)54 cited
- → UFO: Unified Fact Obtaining for Commonsense Question Answering(2023)1 cited
- → CapableOf Reasoning: A Step Towards Commonsense Oracle(2020)1 cited
- → Benchmarks for Automated Commonsense Reasoning: A Survey(2023)6 cited