A Data-dependent Generalisation Error Bound for the AUC

Nicolas Usunier, Massih-Reza Amini, Patrick Gallinari
Laboratoire d'Informatique Paris 6
8, rue du capitaine scott
75015 Paris

In this paper, we are interested in the generalisation properties of the Area Under the ROC Curve (AUC). The optimisation of the AUC has recently been proposed for learning ranking functions. However, the estimation of the AUC of a function - depending on the true distribution of examples - using its empirical value - computed on a training set - is still an open problem. In this paper, we present the first \textit{data-dependent} generalisation error bound for the AUC. This bound presents the advantage to be thight, it also allows to draw practical conclusions on learning algorithms which optimise the AUC. In particular, we show that in the case of AUC, kernel function classes have strong generalisation guarantees provided that the weights of the functions are small, suggesting that regularisation procedures which tend to limit the norm of the weight vector may lead to better generalisation performance for algorithms which optimise the AUC.