Advances in Neural Information Systems 21 (NIPS 2008)

Massih-Reza Amini(1), François Laviolette(2), Nicolas Usunier (1)
(1) Laboratoire d'Informatique Paris 6    (2) Université Laval
         104, avenue du président Kennedy     Paviollon Adrien-Pouliot
                          75016 Paris               Canada, G1K 7P4

We propose two transductive bounds on the risk of majority votes that are estimated over partially labeled training sets. The first one involves the margin distribution of the classifier and a risk bound on its associate Gibbs classifier. The bound is tight when so is the Gibbs’s bound and when the errors of the majority vote classifier is concentrated on a zone of low margin. In semi-supervised learning, considering the margin as an indicator of confidence constitutes the working hypothesis of algorithms which search the decision boundary on low density regions. Following this assumption, we propose to bound the error probability of the voted classifier on the examples for whose margins are above a fixed threshold. As an application, we propose a self-learning algorithm which iteratively assigns pseudo-labels to the set of unlabeled training examples that have their margin above a threshold obtained from this bound. Empirical results on different datasets show the effectiveness of our approach compared to the same algorithm and the TSVM in which the threshold is fixed manually.