Theoretical Models for Complex Data Processing
The research activity pursued on this topic is part of the machine learning, information modeling and data analysis domain. We are interested in defining the formal properties that the information systems must satisfy, the formalization of the concept of “relevance” and the definition of the appropriate mechanisms of information selection. We also work on defining and learning proximity indexes that quantify the similarity of data, more particularly relational data (charts, graphs and hyper-graphs) and time sequences. This work, whose goals include defining a unified formalism of existing learning and data analysis metrics, has already allowed us to propose new learning algorithms for distances and similarities: generalized cosine similarities in co-classification, distances between time series.
Another important topic is related to large scale learning. Current learning systems must be able to handle large amounts of data: some categories systems, such as DMOZ, have hundreds of thousands of categories, organized hierarchically. In order to select the best strategy for categorization in such systems, it is important to review the existing results (eg, bounds on the generalization error) in a context where several categories systems are competing.