Semi-supervised Learning with Explicit Misclassification Modeling

Massih-Reza Amini, Patrick Gallinari
Laboratoire d'Informatique Paris 6
8, rue du capitaine scott
75015 Paris

This paper investigates a new approach for training discriminant classifiers when only a small set of labeled data is available together with a large set of unlabeled data. This algorithm optimizes the classification maximum likelihood of a set of labeled-unlabeled data, using a variant form of the Classification Expectation Maximum (CEM) algorithm. Its originality is that it makes use of both unlabeled data and of a probabilistic misclassification model for these data. The parameters of the label-error model are learned together with the classifier parameters. We demonstrate the effectiveness of the approach on four data-sets and show the advantages of this method over a previously developed semi-supervised algorithm which does not consider imperfections in the labeling process.