From Wiki du projet-action Khronos
Jump to: navigation, search

Work in progress (May 2014)

The starting point of the project will be given following the events which will be held in June 2014.

Two funded PhD starting in September 2014 will cover the two main aspects presented in the project and which are namely learning with non-stationary data and, prediction and recovery of time-varying signal. The objective of the latter is to devise the implementation of the adaptive filters through iterative saddle-point optimization which should allow to treat efficiently large-scale data and to study the properties of proposed algorithms under various observation signal scenarios. The aim of the former thesis is the development of a new framework for learning with interdependent and non-identically distributed data and also appropriate algorithms able to learn from large volumes of non-stationary data under the developed framework. More particularly, we are interested in applying the proposed algorithms to large scale applications which are on the fields of competence of laboratories member of Khronos, and where data are naturally non-stationary and sequential. Dimitry Ostrovsky (MIPT, Moscow, Russia) and Bikash Moji (Madrass-MIT, Abu-Dhabi), are the selected candidates for these theses.

PhD theses financed by Khronos

  • 23 octobre 2013 - Scalable learning algorithms for distributed collaborative filtering and large-scale multi-class classification
    Keywords: Large scale learning, Collaborative Filtering, Multi-class classification
    Summary: The tremendous production of data, known as the big data phenomena, has overturned the classical view in science and information technology domains, notably in the statistical machine learning field. In many real problems, particularly associated with the Internet but not only, massive data streams are continuously produced. In this thesis we are interested in the study of learning algorithms that can pass the scale; we are more particularly interested in multi-class classification and Collaborative filtering. The latter is popularly used by internet vendors such as Amazon, Netflix, Yahoo! and others. However, with increasing number of users and items, attaining a high prediction accuracy is a computationally challenging problem. So, for this application we introduce an asynchronous distributed framework to cope up the large-scale dataset challenge. Additionally, we propose a novel regularization parameter to take into account the interaction of similar users/items when estimating the predicted ratings.
    Le déluge de données (big data) auquel nous assistons ces dernières années bouleverse la vision traditionnelle en sciences et technologies de l'information, et en particulier en apprentissage statistique. Dans de nombreux problèmes réels, en particulier associés à la toile mais pas seulement, un flux massif de données est produit continuellement. Dans cette thèse, nous sommes intéressés à l'étude des algorithmes d'apprentissage qui peuvent passer à l'échelle; nous sommes plus particulièrement intéressés à la classification multi-classe et filtrage collaboratif. Ce dernier est couramment utilisée par les fournisseurs d'Internet comme Amazon, Netflix, Yahoo! et d'autres. Cependant, avec le nombre d'utilisateurs et d'articles en constante hausse, atteindre une prédiction de grande précision est un problème difficile. Ainsi, pour cette application, nous introduisons un cadre distribué asynchrone. En outre, nous proposons une nouvelle régularisation de la fonction objective qui tient compte des interactions utilisateurs / items similaires.
  • 15 novembre 2013 - proposition de thèse Distributed Recovery of Time-varying Signals with Unknown Local Structure
    Keywords: adaptive estimation, nonparametric estimation, convex optimization, nonlinear filtering
    Summary: The subject of the thesis is to explore the properties of the adaptive nonlinear filter in the problem of signal and image recovery from indirect (incomplete and blurry) observations. The objective of this work is twofold. First, the statistical properties of the proposed algorithms should be studied under various observation and signal scenarios. Second, we aim to devise fast adaptive filter implementation through iterative saddle-point optimization which allows to treat efficiently large-scale data.
    Il s’agit d'étudier les propriétés du non linéaire adaptatif filtrage dans le problème de restauration d'image et de signal à partir d’observations indirectes (incomplètes et floues). L'objectif de ce travail est double. Tout d'abord, les propriétés statistiques d’algorithmes proposés doivent être étudiées sous différents hypothèses faite sur les types d’observables et les classes de signaux. Deuxièmement, nous nous efforçons de concevoir la mise en œuvre de filtre adaptatif rapide en utilisant des algorithmes itératifs d'optimisation qui doivent permet de traiter efficacement les données à très grande échelle.