Self-Supervised Learning for Automatic Text Summarization by Text-span Extraction

Massih-Reza Amini, Patrick Gallinari
Laboratoire d'Informatique Paris 6
case 169
4, place de Jussieu
75252 Paris cedex 05

We describe a system for automatic text summarization that operates by extracting the most relevant sentences from documents with regard to a query. The lack of labeled corpora makes it difficult to develop automatic techniques for summarization. We propose to use a self-supervised method which does not rely on the availability of labeled corpora for learning to rank sentences for the summary. The method operates in two steps: first a statistical similarity based system which does not require any training is developed, second a classifier is trained using self-supervised learning in order to improve this baseline method. This idea is evaluated on the Reuters news-wire corpus and compared to other strategies.