Knowledge and Information Systems Journal

Real-life applications may involve huge datasets with misclassified or partially classified training data. Semi-supervised learning and learning in the presence of label-noise have recently emerged as new paradigms in the machine learning community to cope with this kind of problems. This paper describes a new discriminant algorithm for semi-supervised learning. This algorithm optimizes the classification maximum likelihood of a set of labeled-unlabeled data, using a discriminant extension of the Classification Expectation Maximization algorithm. We further propose to extend this algorithm by modeling imperfections in the estimated class labels for unlabeled data. The parameters of this label-error model are learned together with the semi-superised classifier parameters. We demonstrate the effectiveness of the approach using extensive experiments on different datasets.