A Boosting Algorithm for Learning Bipartite Ranking Functions with Partially Labeled Data

Massih-Reza Amini(1), Tuong-Ving Truong(1), Cyril Goutte(2)
(1) Laboratoire d'Informatique Paris 6              (2) National Research Council Canada
              104, avenue du président Kennedy                     123, boulevard Alexandre Taché         
                  75016 Paris                                                   Gatineau, Canada         

This paper presents a boosting based algorithm for learning a bipartite ranking function (BRF) with partially labeled data. Until now different attempts had been made to build a BRF in a transductive setting, in which the test points are given to the methods in advance as unlabeled data. The proposed approach is a semi-supervised inductive ranking algorithm which, as opposed to transductive algorithms, is able to infer an ordering on new examples that were not used for its training. We evaluate our approach using the TREC-9 Ohsumed and the Reuters-21578 data collections, comparing against two semi-supervised classification algorithms for ROCArea (AUC), uninterpolated average precision (AUP), mean precision$@50$ (TP) and Precision-Recall (PR) curves. In the most interesting cases where there are an unbalanced number of irrelevant examples over relevant ones, we show our method to produce statistically significant improvements with respect to these ranking measures.