Improved Pattern Learning for Bootstrapped Entity Extraction
Citations Over TimeTop 10% of 2014 papers
Abstract
Bootstrapped pattern learning for entity extraction usually starts with seed entities and iteratively learns patterns and entities from unlabeled text. Patterns are scored by their ability to extract more positive entities and less negative entities. A problem is that due to the lack of labeled data, unlabeled entities are either assumed to be negative or are ignored by the existing pattern scoring measures. In this paper, we improve pattern scoring by predicting the labels of unlabeled entities. We use various unsupervised features based on contrasting domain-specific and general text, and exploiting distributional similarity and edit distances to learned entities. Our system outperforms existing pattern scoring algorithms for extracting drug-andtreatment entities from four medical forums.
Related Papers
- → Similarity coefficient methods applied to the cell formation problem: a comparative investigation(2005)98 cited
- → Part family formation based on a new similarity coefficient which considers alternative routes during machine failure(1998)11 cited
- An Exploration on the Applicability of Similarity Parameter in Similarity Forecasting(2011)
- Elements to Influence Similarity in TM and Suggested Solutions(2010)
- Similar and Similarity Surplus in the Figurative Thinking(2000)