Distant supervision for relation extraction without labeled data
Citations Over TimeTop 1% of 2009 papers
Abstract
Modern models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora. We investigate an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size. Our experiments use Freebase, a large semantic database of several thousand relations, to provide distant supervision. For each pair of entities that appears in some Freebase relation, we find all sentences containing those entities in a large unlabeled corpus and extract textual features to train a relation classifier. Our algorithm combines the advantages of supervised IE (combining 400,000 noisy pattern features in a probabilistic classifier) and unsupervised IE (extracting large numbers of relations from large corpora of any domain). Our model is able to extract 10,000 instances of 102 relations at a precision of 67.6%. We also analyze feature performance, showing that syntactic parse features are particularly helpful for relations that are ambiguous or lexically distant in their expression.
Related Papers
- → Minimally Supervised Novel Relation Extraction Using a Latent Relational Mapping(2011)11 cited
- → Document-level relation extraction with multi-semantic knowledge interaction(2024)1 cited
- → Using unlabeled data to handle domain-transfer problem of semantic detection(2008)8 cited
- → Classification Method Utilizing Reliably Labeled Data(2008)
- → Weak Adaptation Learning -- Addressing Cross-domain Data Insufficiency with Weak Annotator(2021)