Web-scale information extraction in knowitall
Citations Over TimeTop 1% of 2004 papers
Abstract
Manually querying search engines in order to accumulate a large bodyof factual information is a tedious, error-prone process of piecemealsearch. Search engines retrieve and rank potentially relevantdocuments for human perusal, but do not extract facts, assessconfidence, or fuse information from multiple documents. This paperintroduces KnowItAll, a system that aims to automate the tedious process ofextracting large collections of facts from the web in an autonomous,domain-independent, and scalable manner.The paper describes preliminary experiments in which an instance of KnowItAll, running for four days on a single machine, was able to automatically extract 54,753 facts. KnowItAll associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KnowItAll's architecture and reports on lessons learned for the design of large-scale information extraction systems.
Related Papers
- → When speed has a price(2013)5 cited
- → Harnessing Open Information Extraction for Entity Classification in a French Corpus(2016)4 cited
- → A measure for evaluating search engines on the World Wide Web(1998)5 cited
- → THE ARCHITECTURE OF INFORMATION EXTRACTION FOR ONTOLOGY POPULATION IN CONTRACTOR SELECTION(2016)
- → Optimising Human-Machine Collaboration for Efficient High-Precision Information Extraction from Text Documents(2023)