Deep learning in information retrieval (IR) and natural language processing (NLP)

My research is motivated by the proposal of new models based on deep learning for information retrieval and automatic natural language processing.
The common objective of these models is to process and access textual data. (Large) language models are at the core of my various projects, which have applications related to semantic learning, human-machine interaction (in information retrieval and robotics), and continual learning. Since 2018, I have been involved in two major research projects focusing on data-to-text generation and conversational information retrieval.

Data-to-text generation

The objective is to generate textual descriptions of structured inputs (tables/graphs/…). This task is particularly interesting in the financial domain, sport journalism, or health since it allows to synthetize and reason over large set of structured data which might be hardly readable for humans.



Conversational search

This work is supported by the ANR JCJC SESAMS.
The objective is to support users’ search through interactive and proactive systems, anticipating their needs and guiding users for solving their task. Information need refinement/understanding, belief tracker, dialog systems, and language generation are examples of tasks that can be addressed in this topic.



Language models for robotics

One emerging hypothesis in robotics is that reinforcement learning algorithms aiming to predict robot actions might be enhanced by the semantics underlying language models. Our objective is to design hybrid models combining RL and LLM to generate instructions in natural language, aiming to guide the action prediction.



Continual learning and domain adaptation

Language models and, more generally neural models, might suffer from catastrophic forgetting while being fine-tuned on additional data. Our objective is to leverage this limitation while maintaining the knowledge learned on previous tasks.

 

Invited talks

  • BNF (June 202″) – Panelist « Quelles articulations entre les différentes formes de recommandation, algorithmiques et humaines ? »
  • NormaSTIC – Université de Caen (June 2022) « Data-to-text generation: let your data speak fluently »
  • LISN – Séminaire TLC (February 2022) « Data-to-text generation: let your data speak fluently »
  • DGA (September 2021) « Recherche d’information neuronale: Enjeux et perspectives »
  • NaverlLabs (July 2021) « Data-to-text generation: let your data speak fluently »
  • Summer school ETAL (June 2021): Lecturer – « Information retrieval models » and practical activities
  • « THL et multimodalité » Days – THL/AFIA (oct 2020): « From multimodal representation learning to multimodal information access »
  • LIS seminar – Marseille (December 2020): « From multimodal representation learning to multimodal information access »
  • PhisIA seminar – Univ Paris Diderot (nov 2019): « Le symbolique au service du connexionnisme et vice-versa : apprentissage de représentation augmenté, extraction d’information et bases de connaissances« 
  • ERIC lab seminar (oct 2019): « Apprentissage de représentations textuelles augmentées bases de connaissances: application à la Recherche d’information« 
  • GDR IA – 2019: « Ancrage visuel et conceptuel du texte pour l’apprentissage de représentation »
  • Panelist for the Pré-GDR TAL (March 2019)
  • Laboratoire ERIC « De la Recherche d’information collaborative à la recherche d’information socio-collaborative : fondements, modèles et perspectives »

 

Projects

  • 2022-2026: ANR PRCE ACDC. Data-to-text generation
    Consortium: MLIA@Sorbonne, LAMSADE@ParisDauphine/PSL MHNH@Sorbonne, Recital
  • 2019-2024:  ANR JCJC SESAMS. Search-oriented Conversational systems      — Coordinator 
    Consortium: Vincent Guigue (MLIA-LIP6), Ludovic Denoyer (FAIR Paris), Jian-Yun Nie (Univ. Montréal Canada), Philippe Preux (Univ. Lille)
  • 2014-2019: CHIST-ERA MUSTER. Ground language in perception (visual inputs) and extract representations of meaning tied to the physical world.
    Consortium: KU Leuven, Belgium; ETH Zurich, Switzerland; LIP6 UPMC, France; University of the Basque Country, Spain
  • 2014-2015: PEPS CNRS. EXPloration sur l’usage des médias sociaux pour un Accès Collaboratif à l’information.
    Consortium: IRIT- SIG ; CNRS March Bloch, Berlin ; Maths, SMMA, Université Paris Sorbonne
  •