SIGIR'08 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval

Massih-Reza Amini(1), Tuong-Ving Truong(1), Cyril Goutte(2)
(1) Laboratoire d'Informatique Paris 6    (2) National Research Council Canada
      104, avenue du président Kennedy            123, boulevard Alexandre Taché
              75016 Paris                           Gatineau, Canada

This paper presents a boosting based algorithm for learning a bipartite ranking function (BRF) with partially labeled data. Until now different attempts had been made to build a BRF in a transductive setting, in which the test points are given to the methods in advance as unlabeled data. The proposed approach is a semi-supervised inductive ranking algorithm which, as opposed to transductive algorithms, is able to infer an ordering on new examples that were not used for its training. We evaluate our approach using the TREC-9 Ohsumed and the Reuters-21578 data collections, comparing against two semi-supervised classification algorithms for ROCArea (AUC), uninterpolated average precision (AUP), mean precision$@50$ (TP) and Precision-Recall (PR) curves. In the most interesting cases where there are an unbalanced number of irrelevant examples over relevant ones, we show our method to produce statistically significant improvements with respect to these ranking measures.