A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization

Massih-Reza Amini and Nicolas Usunier
Laboratoire d'Informatique Paris 6
104, avénue du Président Kennedy
75016 Paris

This paper describes the different steps which lead to the construction of the LIP6 extractive summarizer. The basic idea behind this system is to expand question and title keywords of each topic with their respective cluster terms. Term clusters are found by unsupervised learning using a classification variant of the well-known EM algorithm. Each sentence is then characterized by 4 features, each of which uses bag-of-words similarities between expanded topic title or questions and the current sentence. A final score of the sentences is found by manually tuning the weights of a linear combination of these features ; these weights are chosen in order to maximize the Rouge-2 AvF measure on the Duc 2006 corpus.