Learning to Summarize XML Documents by Combining Content and Structure Features
Massih-Reza Amini(1), Anastasios Tombros(2), Nicolas Usunier(1), Mounia Lalmas(2), Patrick Gallinari(1)
(1) Laboratoire d'Informatique Paris 6 (2)Queen Mary, University of London
8, rue du capitaine scott Mile End Road
75015 Paris London E1 4NS
Documents formatted in eXtensible Markup Language (XML) are
becoming increasingly available in collections of various document
types. In this paper, we present an approach for the summarisation
of XML documents. The novelty of this approach lies in that it is
based on features not only from the content of documents, but also
from their logical structure. We follow a machine learning like,
sentence extraction-based summarisation technique. To find which
features are more effective for producing summaries this approach
views sentence extraction as an ordering task. We evaluated our
summarisation model using the INEX dataset. The results demonstrate
that the inclusion of features from the logical structure of
documents increases the effectiveness of the summariser, and that
the learnable system is also effective and well-suited to the task of
summarisation in the context of XML documents.