Transferring Knowledge with Source Selection to Learn IR Functions on Unlabeled Collections

Parantapa Goswami, Massih-Reza Amini and Eric Gaussier
Laboratoire d'Informatique de Grenoble
Université Joseph Fourier, Grenoble, France

We investigate the problem of learning an IR function on a collection without relevance judgements (called target collection) by transferring knowledge from a selected source collection with relevance judgements. To do so, we first construct, for each query in the target collection, relative relevance judgment pairs using information from the source collection closest to the query (selection and transfer steps), and then learn an IR function from the obtained pairs in the target collection (self-learning step). For the transfer step, the relevance information in the source collection is summarized as a grid that provides, for each term frequency and document frequency values of a word in a document, an empirical estimate of the relevance of the document. The self-learning step iteratively assigns pairwise preferences to documents in the target collection using the scores of the former learned function. We show the effectiveness of our approach through a series of extensive experiments on CLEF-3 and several collections from TREC used either as target or source datasets. Our experiments show the importance of selecting the source collection prior to transfer information to the target collection, and demonstrate that the proposed approach yields results consistently and significantly above state-of-the-art IR functions.