A Transductive Bound for the Voted Classifier with an Application to Semi-Supervised Learning

Massih-Reza Amini(1), François Laviolette(2), Nicolas Usunier (1)
(1) Laboratoire d'Informatique Paris 6              (2) Université Laval
                       104, avenue du président Kennedy              Paviollon Adrien-Pouliot              
                              75016 Paris                                     Canada, G1K 7P4         

We propose two transductive bounds on the risk of majority votes that are estimated over partially labeled training sets. The first one involves the margin distribution of the classifier and a risk bound on its associate Gibbs classifier. The bound is tight when so is the Gibbs’s bound and when the errors of the majority vote classifier is concentrated on a zone of low margin. In semi-supervised learning, considering the margin as an indicator of confidence constitutes the working hypothesis of algorithms which search the decision boundary on low density regions. Following this assumption, we propose to bound the error probability of the voted classifier on the examples for whose margins are above a fixed threshold. As an application, we propose a self-learning algorithm which iteratively assigns pseudo-labels to the set of unlabeled training examples that have their margin above a threshold obtained from this bound. Empirical results on different datasets show the effectiveness of our approach compared to the same algorithm and the TSVM in which the threshold is fixed manually.