Self-Training with Selection-by-Rejection
Citations Over Time
Abstract
Practical machine learning and data mining problems often face shortage of labeled training data. Self-training algorithms are among the earliest attempts of using unlabeled data to enhance learning. Traditional self-training algorithms label unlabeled data on which classifiers trained on limited training data have the highest confidence. In this paper, a self-training algorithm that decreases the disagreement region of hypotheses is presented. The algorithm supplements the training set with self-labeled instances. Only instances that greatly reduce the disagreement region of hypotheses are labeled and added to the training set. Empirical results demonstrate that the proposed self-training algorithm can effectively improve classification performance.
Related Papers
- Email classification with co-training(2011)
- → Integrating Co-Training and Recognition for Text Detection(2005)9 cited
- → Reinforced Co-Training(2018)2 cited
- → A new cross-training approach by using labeled data(2009)