Comparative Classi er Evaluation for Web-scale Taxonomies using Power Law

Rohit Babbar, Ioannis Partalas, Cornélia Metzig, Eric Gaussier and Massih-Reza Amini
Laboratoire d'Informatique de Grenoble
Université Joseph Fourier, Grenoble, France

In the context of web-scale taxonomies such as Mozilla and Yahoo! directories, previous works have shown the existence of power law distribution in the size of the categories for every level in the taxonomy. In this work, we analyse how such high-level semantics can be leveraged to evaluate accuracy of hierarchical classi ers which automatically assign the unseen documents to leaf-level categories in the taxonomy. Commonly used evaluation method, which relies on k-fold cross-validation su ers from computational challenges for such large scale taxonomies. The proposed technique provides a necessary condition for acceptable performance of a hierarchical classi er based on power law behavior. Using this technique for classi er evaluation on the publicly available data supports our claim empirically.