Research

My research deals with Data Science, i.e. with models and systems to extract information, insights and knowledge from data in various forms. The data on which I have primarily worked are large-scale, multlingual collections. I have been particularly interested in information access (comprising information retrieval, categorization and clustering) and in extraction of lexical knowledge from such collections. I am also interested in modeling how the textual information is shared in social (content) networks, and how such networks evolve over time. More recently, I have also been working on improving job scheduling techniques through machine learning, and in learning representations (through metric learning) for different types of data, including time series.

  • Here's a video of the LIX colloquium on Data Science, held on Nov. 2014
  • And here's a short video to explain what textual information access is!
My research is both fundamental, through the development of new models that explain different chracteristics of large-scale collections/networks, and applied, through the development of algortihms and tools to mine and extract information from large collections. Prior to joining the university, the models I developed with colleagues at XRCE (Xerox Research Centre Europe) on text clustering and categorization earned Xerox the Basex award in 2004 and a Wall Street Journal Technology Innovation award in 2005. Here are some of the most recent theoretical developments I have participated to:
  • Information-based models ([1],[2]) for information retrieval, which draw a bridge between burstiness and IR heuristic constraints; these models are integrated in the two IR platforms Lucene and Terrier
  • Generic framework for large-scale hierarchical classification ([3],[4])
  • Metric and representation learning ([5]), including (ε,γ,τ)-good metric learning ([6])
Scientific animation I have been a member of the Executive Board of the European Association for Computational Linguistics from 2007 to 2010, a member of the Computer Science panel of the European Research Council for Starting/Consolidator Grants, from 2007 to 2013, and I am a member of the Advisory Board of SIGDAT since 2005.
I was program co-chair of EMNLP 2006, workshop co-chair for EMNLP 2014, chair of CORIA 2015 and co-chair of IEEE DSAA 2015. I have also served as area chair during several years for SIGIR and ECIR and co-chaired and co-rganized several international workshops (as LSHC1, within ECIR 2010, LSHC2, within ECML 2011 and LSHC3, within ECML 2013).
I am a member of the editorial boards of the journals: International Journal of Data Science and Analytics, Information Retrieval, Information and Document Numérique, and a past member of the editorial boards of the journals: Traitement automatique des langues, Computational Linguistics, International Journal of Corpus Linguistics.

Projects I am, or have been, involved in the following projects:
  • Smart Support Centers (FUI) (March 2015-Sept. 2018)
  • Graphical models for modeling the dynamics of content networks (regional project) (Sept. 2014-Sept. 2017)
  • New theoretical frameworks in metric learning (regional project) (Sept. 2013-Sept. 2016)
  • Khronos Persyval (labex) project on data mining of temporal data (Sept. 2013-Sept. 2017)
  • ARESOS CNRS Mastodons project (started in 2012)
  • and was recently involved in the projects:
  • CLASS-Y (ANR project) (February 2011-February 2015)
  • BioASQ (Eur. project) (Oct. 2012-Oct. 2014)
  • PASCAL2 European network of excellence (2009-2013)
  • MeTRICC (ANR project) (December 2008-December 2011)
  • FRAGRANCES (ANR project) (December 2008-December 2011)
  • LASCAR (LArge Scale CAtegoRization - UJF project) (January 2008-December 2009)
  • INFOM@GIC (French project) (2005-2006 pour ma participation)
  • PASCAL European Network of Excellence (2004-2006)
  • REVEAL THIS (European project) (2004-2007)
  • KerMIT (European project) (2001-2004)
  • Outiller les Alliances (French project) (2001-2003)
  • MuchMore (European project) (1999-2002)
Current and Past PhD students (domains of research are indicated in italics)
  • Yagmur Cinar, FUI funding; Information retrieval, machine learning; (2015-)
  • Hesam Amoualian, co-supervised with M.-R. Amini and M. Clausel, French national funding MESR; Machine learning; (2014-)
  • Adrien Dulac, co-supervised with C. Largeron, regional funding; Social network, machine learning; (2014-)
  • Diana Popa, co-supervised with J. Henderson, CIFRE XRCE; Computational linguistics, machine learning; (2014-)
  • Théo Trouillon, co-supervised with G. Bouchard, CIFRE XRCE; Machine learning; (2014-)
  • Irina Nicolae, co-supervised with M. Sebban, regional funding; Machine learning; (2013-)
  • Saeid Soheily Khah, co-supervised with A. Douzal, industrial funding; Data analysis; (2013-)
  • Abdelkader El Mahdaouy, co-supervised with S. Ouatik, co-tutelle Univ. de Fès, Maroc; Computational linguistics, information retrieval; (2013-)
  • Hamid Mirisaee, co-supervised with A. Termier, French national funding MESR; Data mining, social network analysis and mining; (2012-2015)
  • François Kawala, co-supervised with A. Douzal, CIFRE Best of Media; Social network analysis and mining; (2011-2015)
  • Rohit Babbar, co-supervised with M.-R. Amini, ANR funding; Machine learning; (2011-2014)
  • Parantapa Goswami, co-supervised with M.-R. Amini, French national funding MESR; Information retrieval, machine learning; (2011-2014)
  • Cédric Lagnier, French national funding MESR; Social network analysis and mining; (2009-2013)
  • Clément Grimal, co-supervised with G. Bisson, ANR funding; Machine learning; (2009-2012)
  • Bo Li, ANR funding; Computational linguistics; (2009-2012)
  • Franck Meyer, Orange Labs; Machine learning; (2007-2012)
  • Stéphane Clinchant, CIFRE XRCE; Information retrieval; (2008-2011)
  • Ali Mustafa Qamar, French national funding MESR; Machine learning; (2007-2010)
  • Leila Kefi, co-supervised with C. Berrut, French national funding MNRT; Information retrieval; (2002-2006)
  • François Trouilleux, co-supervised with G. Bes and A. Zaenen, CIFRE XRCE; Linguistics; (1998-2001)
Some references (see Publications)
[1] S. Clinchant, E. Gaussier. Information-based models for ad-hoc information retrieval. Proc. of SIGIR 2010.
[2] B. Li, E. Gaussier. An information-based cross-language information retrieval model. Proc. of ECIR 2012.
[3] R. Babbar, C. Metzig, I. Partalas, E. Gaussier, M.-R. Amini. On power-law distribution in large-scale taxonomies. SIGKDD Explorations, 16(1), 2014.
[4] R. Babbar, I. Partalas, E. Gaussier, M.-R. Amini. On flat versus hierarchical classification in large-scale taxonomies. Proc. of NIPS 2013.
[5] A. Qamar, E. Gaussier. Online and batch learning of generalized cosine similarities. Proc. of ICDM 2009.
[6] M.-I. Nicolae, E. Gaussier, A. Habrard, M. Sebban. Joint semi-supervised similarity learning for linear classification. Proc. of ECML/PKDD 2015.