My research deals with Data Science, i.e. with models and systems to extract information, insights and knowledge from data in various forms. The data on which I have primarily worked are large-scale, multlingual collections. I have been particularly interested in information access (comprising information retrieval, categorization and clustering) and in extraction of lexical knowledge from such collections. I am also interested in modeling how the textual information is shared in social (content) networks, and how such networks evolve over time. More recently, I have also been working on improving job scheduling techniques through machine learning, and in learning representations (through metric learning) for different types of data, including time series.
- Here's a video of the LIX colloquium on Data Science, held on Nov. 2014
- And here's a short video to explain what textual information access is!
- Information-based models (,) for information retrieval, which draw a bridge between burstiness and IR heuristic constraints; these models are integrated in the two IR platforms Lucene and Terrier
- Generic framework for large-scale hierarchical classification (,)
- Metric and representation learning (), including (ε,γ,τ)-good metric learning ()
I was program co-chair of EMNLP 2006, workshop co-chair for EMNLP 2014, chair of CORIA 2015 and co-chair of IEEE DSAA 2015. I have also served as area chair during several years for SIGIR and ECIR and co-chaired and co-rganized several international workshops (as LSHC1, within ECIR 2010, LSHC2, within ECML 2011 and LSHC3, within ECML 2013).
I am a member of the editorial boards of the journals: International Journal of Data Science and Analytics, Information Retrieval, Information and Document Numérique, and a past member of the editorial boards of the journals: Traitement automatique des langues, Computational Linguistics, International Journal of Corpus Linguistics.
Projects I am, or have been, involved in the following projects:
- Smart Support Centers (FUI) (March 2015-Sept. 2018)
- Graphical models for modeling the dynamics of content networks (regional project) (Sept. 2014-Sept. 2017)
- New theoretical frameworks in metric learning (regional project) (Sept. 2013-Sept. 2016)
- Khronos Persyval (labex) project on data mining of temporal data (Sept. 2013-Sept. 2017)
- ARESOS CNRS Mastodons project (started in 2012) and was recently involved in the projects:
- CLASS-Y (ANR project) (February 2011-February 2015)
- BioASQ (Eur. project) (Oct. 2012-Oct. 2014)
- PASCAL2 European network of excellence (2009-2013)
- MeTRICC (ANR project) (December 2008-December 2011)
- FRAGRANCES (ANR project) (December 2008-December 2011)
- LASCAR (LArge Scale CAtegoRization - UJF project) (January 2008-December 2009)
- INFOM@GIC (French project) (2005-2006 pour ma participation)
- PASCAL European Network of Excellence (2004-2006)
- REVEAL THIS (European project) (2004-2007)
- KerMIT (European project) (2001-2004)
- Outiller les Alliances (French project) (2001-2003)
- MuchMore (European project) (1999-2002)
- Yagmur Cinar, FUI funding; Information retrieval, machine learning; (2015-)
- Hesam Amoualian, co-supervised with M.-R. Amini and M. Clausel, French national funding MESR; Machine learning; (2014-)
- Adrien Dulac, co-supervised with C. Largeron, regional funding; Social network, machine learning; (2014-)
- Diana Popa, co-supervised with J. Henderson, CIFRE XRCE; Computational linguistics, machine learning; (2014-)
- Théo Trouillon, co-supervised with G. Bouchard, CIFRE XRCE; Machine learning; (2014-)
- Irina Nicolae, co-supervised with M. Sebban, regional funding; Machine learning; (2013-)
- Saeid Soheily Khah, co-supervised with A. Douzal, industrial funding; Data analysis; (2013-)
- Abdelkader El Mahdaouy, co-supervised with S. Ouatik, co-tutelle Univ. de Fès, Maroc; Computational linguistics, information retrieval; (2013-)
- Hamid Mirisaee, co-supervised with A. Termier, French national funding MESR; Data mining, social network analysis and mining; (2012-2015)
- François Kawala, co-supervised with A. Douzal, CIFRE Best of Media; Social network analysis and mining; (2011-2015)
- Rohit Babbar, co-supervised with M.-R. Amini, ANR funding; Machine learning; (2011-2014)
- Parantapa Goswami, co-supervised with M.-R. Amini, French national funding MESR; Information retrieval, machine learning; (2011-2014)
- Cédric Lagnier, French national funding MESR; Social network analysis and mining; (2009-2013)
- Clément Grimal, co-supervised with G. Bisson, ANR funding; Machine learning; (2009-2012)
- Bo Li, ANR funding; Computational linguistics; (2009-2012)
- Franck Meyer, Orange Labs; Machine learning; (2007-2012)
- Stéphane Clinchant, CIFRE XRCE; Information retrieval; (2008-2011)
- Ali Mustafa Qamar, French national funding MESR; Machine learning; (2007-2010)
- Leila Kefi, co-supervised with C. Berrut, French national funding MNRT; Information retrieval; (2002-2006)
- François Trouilleux, co-supervised with G. Bes and A. Zaenen, CIFRE XRCE; Linguistics; (1998-2001)
 S. Clinchant, E. Gaussier. Information-based models for ad-hoc information retrieval. Proc. of SIGIR 2010.
 B. Li, E. Gaussier. An information-based cross-language information retrieval model. Proc. of ECIR 2012.
 R. Babbar, C. Metzig, I. Partalas, E. Gaussier, M.-R. Amini. On power-law distribution in large-scale taxonomies. SIGKDD Explorations, 16(1), 2014.
 R. Babbar, I. Partalas, E. Gaussier, M.-R. Amini. On flat versus hierarchical classification in large-scale taxonomies. Proc. of NIPS 2013.
 A. Qamar, E. Gaussier. Online and batch learning of generalized cosine similarities. Proc. of ICDM 2009.
 M.-I. Nicolae, E. Gaussier, A. Habrard, M. Sebban. Joint semi-supervised similarity learning for linear classification. Proc. of ECML/PKDD 2015.