Active, Semi-Supervised Learning for Textual Information Access
Anastasia Krithara(2), Cyril Goutte(2), Massih-Reza Amini(1), Jean-Michel Renders(2)
(1) Laboratoire d'Informatique Paris 6 (2)Xerox Research Center Europe
8, rue du capitaine scott 6, Chemin de Maupertuis
75015 Paris 38240 Meylan
Machine learning techniques have been used for various document management and textual information access tasks, such as categorisation, information extraction, or automatic organization of large document collections. Acquiring the annotated data necessary to apply supervised learning techniques is a major challenge for text applications, especially in very large collections. Annotating textual data usually requires humans who can read and understand the texts, and is therefore very costly, especially in technical domains. In this contribution, we address the problem or reducing this annotation burden.