TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data
Citations Over TimeTop 1% of 2020 papers
Abstract
Recent years have witnessed the burgeoning of pretrained language models (LMs) for textbased natural language (NL) understanding tasks. Such models are typically trained on free-form NL text, hence may not be suitable for tasks like semantic parsing over structured data, which require reasoning over both free-form NL questions and structured tabular data (e.g., database tables). In this paper we present TABERT, a pretrained LM that jointly learns representations for NL sentences and (semi-)structured tables. TABERT is trained on a large corpus of 26 million tables and their English contexts. In experiments, neural semantic parsers using TABERT as feature representation layers achieve new best results on the challenging weakly-supervised semantic parsing benchmark WIKITABLEQUESTIONS, while performing competitively on the text-to-SQL dataset SPIDER.
Related Papers
- → A Benchmark Test Structure for Experimental Dynamic Substructuring(2011)9 cited
- → Solutions to the Third Benchmark Control Problem(1991)3 cited
- → The Performance Validation of Linear Programming Algorithm Based on Integrated Benchmark(2010)
- Theoretical Analysis of the Benchmark for Choosing Manipulative Instruments of Monetary Policies(2009)
- → Support Structure Performance Benchmark(2023)