A Self-Training method for Learning to Rank with Unlabeled Data

Vinh Truong(1), Massih-Reza Amini(2), Patrick Gallinari
(1) Laboratoire d'Informatique Paris 6              (2) National Research Council Canada
              104, avenue du président Kennedy                     123, boulevard Alexandre Taché         
                  75016 Paris                                                   Gatineau, Canada         

This paper presents a new algorithl for bipartite ranking functions trained with partially labeled data. The algorithm is an exenstion of the self-training paradigm developed under the classification framework. We further propose an efficient and scalable optimization method for training linear models though the approach is general in the sense that it can be applied to any classes of scroing functions. Empirical resutls on several common image and text corpora over the Area Under the ROC Curve (AUC) and the Average Precision measure show that the use of unlabeled data in the training process leass to improve the performance of baseline supervised ranking functions.