Active, Semi-Supervised Learning for Textual Information Access

Anastasia Krithara(2), Cyril Goutte(2), Massih-Reza Amini(1), Jean-Michel Renders(2)
(1) Laboratoire d'Informatique Paris 6              (2)Xerox Research Center Europe
          8, rue du capitaine scott                                              6, Chemin de Maupertuis         
       75015 Paris                                                                 38240 Meylan         

Machine learning techniques have been used for various document management and textual information access tasks, such as categorisation, information extraction, or automatic organization of large document collections. Acquiring the annotated data necessary to apply supervised learning techniques is a major challenge for text applications, especially in very large collections. Annotating textual data usually requires humans who can read and understand the texts, and is therefore very costly, especially in technical domains. In this contribution, we address the problem or reducing this annotation burden.