Maximum-margin Framework for Training Data Synchronization in Large-scale Hierarchical Classification

Rohit Babbar, Ioannis Partalas, Eric Gaussier and Massih-Reza Amini
Laboratoire d'Informatique de Grenoble
Université Joseph Fourier, Grenoble, France

In the context of supervised learning, the training data for large-scale hierarchical classification consist of (i) a set of input-output pairs, and (ii) a hierarchy structure defining parent-child relation among class labels. It is often the case that the hierarchy structure given a- priori is not optimal for achieving high classification accuracy. This is especially true for large web-taxonomies such as Yahoo! directory which consist of tens of thousand of classes, and also the fact that an important goal of hierarchy design is to render better navigability and browsing. In this work, we propose a maximum-margin framework for automatically adapting the given hierarchy based on the set of input-output pairs to yield a new hierarchy. The proposed method is not only theoretically justified but also provides a more principled approach for hierarchy flat- tening techniques proposed earlier, which are ad-hoc and empirical in nature. The empirical results on large-scale public datasets demonstrate that classification with new hierarchy leads to better or comparable gen- eralization performance than the hierarchy flattening techniques. Moreover, since the proposed method largely maintains the overall hierarchical structure, it leads to faster prediction and lower space complexity.