Internships 2016

Learning representations from spatio temporal data

Funding: LOCUST ANR Project (Internship followed by both a funded thesis and a post-doc)
Supervisors: Ahlame Douzal, Eric Gaussier
Team: AMA-LIG
Partners: MLIA-LIP6 UPMC, DEEZER
Description
Human interactions conducted either via the web and mobile services, or with artifacts, moving objects, and intelligent sensors generate large flows of complex dynamic data. These user traces correspond to sequences of observations: events, measurements, semantic content, etc. They may have a space (e.g. geo‐localization) and temporal components that are often composed of multiple types of information.
The internship fits within the Locust project, the objective of which is to build formal models and algorithmic tools aimed at understanding, modeling and analyzing complex dynamic traces (spatiotemporal data) for a set of generic machine learning tasks and for target applications. Two use cases concerning respectively semantic information diffusion and urban computing will support the theoretical
contributions and serve for evaluating the models and algorithms. The project is research oriented with two academic and an industrial partner (DEEZER).
More particularly, the candidate will address the problem of recurrent neural networks (RNN) for classification, cokriging and forecasting of spatio-temporal data. Spatio-temporal data correspond to multivariate time series spatially localized and arising from a multi‐source diffusion process. For that, the aim is to investigate RNN (LSTM, GRU,…) approaches that account for dependencies (long as well as short) between time and space variables in a tractable and scalable way. Finally, this internship will be followed by both a funded thesis and a post-doc.

Representation Learning for Textual and Temporal Data Alignment

Funding: FUI-SSC Project
Supervisors:Eric Gaussier, Hamid Mirisaee, Parantapa Goswami
Partners: Coservit
Description

Information retrieval (IR) and recommendation systems have been of interest of many researchers and companies over the past decade. Classical IR and recommendation systems operate on data of only one type, i.e. data belonging to the same feature space. These systems lack a personalization mechanism that can ‘understand’ the query or reflect the information needs of a user at a particular instance in time and return customized results. In this age of Internet, the online activities of users have increased substantially. This presents the opportunity to exploit additional data about the user in order to predict a user’s profile or context and to use this profile for personalized retrieval or recommendations. Often, these additional data are heterogeneous in nature. For example, quantitative geospatial location data, browsed images or viewed videos. Moreover, the data evolves over time in most cases, which in turn emerges the importance of time series analysis. As a result, to design a better retrieval and recommendation system, one needs to exploit all these heterogeneous contextual pieces of information. In this internship, we will study contextual IR using heterogeneous data in the context of smart support centers (SSC).

SSCs aim to solve the clients’ issues with the computational system in an automatic manner. Different hardwares of the devices in the system (e.g. CPU, RAM, disk etc.) have sensors within them. These sensors capture different quantitative and qualitative values about system usage (e.g. disk usage, RAM usage) at certain interval, which generate a multivariate structured temporal monitoring data. Additionally, tickets are generated based on the users’ complains, which detail the problem, and the time of reporting. The solutions of previously resolved tickets are also archived. These are semi-structured textual data. All these are very large-scale data supplied by industrial partners. Typical support center operations can be described as a contextual IR problem where temporal system monitoring data is used as contextual information to retrieve and recommend textual solutions in response to the user tickets. The difficulty is that the ticket generation time may not correspond to the actual time of occurrence of the problem, as users may realize the problem with some delay, or they may report it with a delay. So, there could be an unknown delay between the unusual behavior of the device’s data (i.e. the monitoring time series data) and the ticket issue time. Without the actual alignment of tickets with the relevant portions of monitoring data, it is not possible to exploit the monitoring data as contextual information. This internship aims to address the challenge of aligning textual information with temporal monitoring data where the delay between them is unknown. A similar study has been done on temporal stock price data and textual stock market news; however, in that case, the delay between the news and the actual change is more or less bounded as the news are published within a short time after a major change in stock prices while in our case, the delay is totally unbounded.

Through this internship you will have the opportunity to:
1. learn about various concepts involved in state-of-the-art of the topic (such as text representation methods, clustering of temporal data), and implement the theories to see all these in action in the context of SSC data.
2.explore different representation learning methods, including deep learning, to represent hetero-
geneous data (quantitative and textual in this case)
3.find a solution for aligning the tickets to the proper point/period of the time series.

PhD Positions

PhD position in Data Analysis and Machine Learning for spatio-temporal data

Title: Learning representations for classification, cokriging and forecasting spatio-temporal data
Duration: 3 years
Starting date: September 2016
Funding: Gross salary about 2000 euros/month(LOCUST ANR Project)
Partners: MLIA-LIP6 UPMC, DEEZER
Supervisors: Ahlame Douzal, Eric Gaussier
Application procedure and other details here.

PhD position in Machine Learning algorithms for learning representations from temporal data

Title: Learning representations from multivariate temporal data
Duration: 3 years
Starting date: September 2015
Funding: Gross salary about 2000 euros/month, Projet Investissement d’Avenir (partners: CS, AIRBUS and EDF- R&D)
Supervisors: Ahlame Douzal, Eric Gaussier
Application procedure and other details here.