European Conference on Machine Learning 2006

Massih-Reza Amini(1), Nicolas Usunier(1), François Laviolette(2), Alexandre Lacasse(2), Patrick Gallinari(1)
(1) Laboratoire d'Informatique Paris 6    (2)Université Laval
      8, rue du capitaine scott                      Pavillon Adrien-Pouliot
                75015 Paris                         G1K7P4 Canada

We propose a novel active learning strategy based on the compression framework for label ranking functions which, given an input instance, predict a total order over a predefined set of alternatives. Our approach is theoretically motived by an extension to ranking and active learning of Kääriänen's generalisation bounds using unlabeled data, initially developed in the context of classification. The bounds we obtain suggest a selective sampling strategy provided that a sufficiently, yet reasonably large initial labeled dataset is provided. Experiments on Information Retrieval corpora from automatic text summarization and question/answering show that the proposed approach allows to substantially reduce the labeling effort in comparison to random and heuristic-based sampling strategies.