On using a Quantum Physics formalism for Multi-document Summarisation
Benjamin Piwowarski(1), Massih-Reza Amini(1), Mounia Lalmas(2)
(1) Laboratoire d'Informatique Paris 6 (2)Yahoo! Research Barcelona
Multi-document summarisation (MDS) aims, for each given query, to extract compressed and relevant information with respect to the different query-related themes present in a set of documents. Many approaches operate in two steps. Themes are first identified from the set, and then a summary is formed by extracting salient sentences within different documents of each of the identified themes. Among these approaches, Latent Semantic Analysis (LSA) based ones rely on spectral decomposition techniques to identify the themes. In this paper, we propose a major extension of these techniques that relies on the Quantum Information Access (QIA) framework. The latter is a framework developed for modelling information access based on the probabilistic formalism of quantum physics. The QIA framework allows to not only point out the limitations of the current LSA-based approaches, but motivates a new principled criterium to tackle multi-document summarisation that addresses these limitations. As a by-product, it also provides a way to enhance the LSA-based approaches. Extensive experiments on the DUC 2005, 2006 and 2007 datasets show that the proposed approach consistently improves over both the LSA-based approaches and the systems that competed in the DUC competitions. This demonstrates the usefulness and potential impact of quantum-inspired approaches to Information Access in general, and of the QIA framework in particular.