May, 22-nd. 14-30. Yury Maximov. Convergence rates in semi-supervised multi-class learning under cluster assumption.
Lieu du séminaire: Laboratorie d'Informatique de Grenoble. Batiment Center Equation 4. Auditorium: Claude Shannon.
Annotation: In this talk we consider multi-class semi-supervised learning (SSL) problem with prescribed number of classes. Typically in SSL problems a significant part of the available data is unlabeled. Unlabeled data could be useful for the classification problem when one makes an additional assumption relating the behavior of the optimal classifier to that of the marginal distribution. Here we propose a two-stage method based on this assumption, that data could be separated into the clusters, such that Bayes classifier assigns the same label to all objects within a certain cluster. On the first stage of the method we identify such clusters (under extra data assumptions) and pseudo-label objects within the clusters in accordance with majority votes. On the second stage we use this data to build a classifier within the given set of functions. In the paper, we give theoretical and empirical support of our method. Finally, we propose some discussion about the method and report the results of a series of experiments with several data sets, including UCI and text classification data.