Learning for Sequence Extraction Tasks

Massih-Reza Amini, Hugo Zaragoza, Patrick Gallinari
Laboratoire d'Informatique Paris 6
case 169
4, place de Jussieu
75252 Paris cedex 05

We consider the application of Machine Learning (ML) techniques for sequence encoding modeling to Information Retrieval (IR) and surface Information Extraction (IE) tasks. We introduce a generic sequence model and show how it cn be used for dealing eith different tasks. Taking into account the sequential nature of texts allows for a finer analysis that what is usually done in IR with static text representations. The task we are focusing on is the retrieval and labeling of text passages, also known as highlighting and surface information extraction. We describe different implementations of our model based on Hidden Markov Models and Neural Networks. Experiments are perfromed using the MUC6 corpus from the information extraction community.