DOING – Project APR-IA

DOING – INTELLIGENT DATA

  • APR-IA (PROJET RECHERCHE D’INITIATIVE ACADEMIQUE)
  • REGION CENTRE VAL DE LOIRE

The DOING project aims at developing methods and tools to first extract information from textual data by structuring them in a graph database, and then to manipulate this knowledge graph in an intelligent way. The chosen application domain is the health domain, with first of all the use of freely available data (such as clinical cases). DOING aims at designing data science queries, i.e., a new form of declarative queries, which can integrate analyses, that will guide healthcare specialists in their decision making. DOING is based on a real interdisciplinary collaboration (Natural Language Processing, Databases and Artificial Intelligence) to transform data into information and knowledge. The goal of the project is to concretize proposals from the DOING working group of the RTR-DIAMS and the DOING action of the GDR-MADICS.

The objective of the DOING project is the proposal and construction of methods, algorithms and tools for the transformation of data into information and then into knowledge. The idea is to bring together the expertise of researchers in Natural Language Processing (NLP), Databases (DB) and Artificial Intelligence (AI) to:
1) extract information from textual data and represent it to populate graph databases;
2) propose intelligent methods for the manipulation and maintenance of these databases with new forms of queries.

These objectives are broken down into three tasks that are developed in parallel; their relationship is the main thread of the project .
T1 : Task 1 – Extraction of information from textual data
T2 : Task 2 – Data science queries: language and algorithms

T3: Task 3 – Analysis and prediction of physician needs (not financed by the project)

MEMBERS

  • Mirian Halfeld Ferrari Alves (45%), porteur, LIFO
  • Anne-Lyse Minard-Forst (35%), LLL
  • Donatello Conte (25%), LIFAT
  • Jacques Chabin (30%), LIFO
  • Genoveva Vargas-Solar (20%), LIRIS (external participant)
  • Jean-Yves Antoine (20%), LIFAT
  • Jean-Yves Ramel (15%), LIFAT
  • Agata Savary (15%), LISN (external participant)
  • Anais Lefeuvre-Halfermeyer (15%), LIFO
  • Flora Badin (10%), LLL
  • Lofti Abouda (10%), LLL
  • Emmanuel Schang (10%), LLL
  • Thi-Bich-Hanh Dao (8%), LIFO

POSTDOC

  • Placido A. Souza Neto (1 March 2023 – 28 February 2024): Data science queries (T2)
  • Silvia Federzoni : Extraction d’information dans les données textuelles (T1)

PhD STUDENTS (collaborating to the project)

  • Lingchen Wang, LIFO
  • Nicolas HIOT, LIFO

RESULTS

SOFTWARE (prototypes)

EASI-GDS:

A user-friendly interface that helps users to build declarative analytical queries on property graphs. These queries are then implemented as Neo4J pipelines.

  • Demonstration: https://youtu.be/pd1s7hOVMx8
  • Installation : https://gitlab.com/mirian/easi-gds_install
  • Developers: Valentin Bouvresse and Virgile Crvenka (master students, Université d’Orléans)

ArchiTXT:

  • Nicolas Hiot and Jacques Chabin and Mirian Halfeld Ferrari (Ed.). 04. 2024. 
  • LIEN: https://hal.science/hal-04732336
  • Software built in connextion to Hiot’s thesis

PUBLICATIONS

  • Jacques Chabin, Mírian Halfeld Ferrari, Lingchen Wang: A Preliminary Investigation: Strategies for Incorporating Logical Rules Into Knowledge Graph Embeddings. ADBIS (Short Papers) 2024: 104-116
  • Mírian Halfeld Ferrari, Anne-Lyse Minard, Genoveva Vargas-Solar: Transforming Text Into Knowledge with Graphs: Report of the GDR MADICS DOING Action. ADBIS (Short Papers) 2024: 145-159
  • Valentin Bouvresse, Jacques Chabin, Virgile Crvenka, Mírian Halfeld Ferrari, Genoveva Vargas-Solar, Lingchen Wang: Vers des requêtes déclaratives en science des données : EASI-GDS pour Neo4J. EGC 2024: 433-440.
  • Jacques Chabin, Mírian Halfeld Ferrari, Nicolas Hiot, Dominique Laurent: Managing Linked Nulls in Property Graphs: Tools to Ensure Consistency and Reduce Redundancy. ADBIS 2023: 180-194
  • Agata Savary, Alena Silvanovich, Anne-Lyse Minard, Nicolas Hiot, Mirian Halfeld Ferrari Alves: Relation Extraction from Clinical Cases for a Knowledge Graph. ADBIS (Short Papers) 2022: 353-365
  • Jacques Chabin, Mírian Halfeld Ferrari, Nicolas Hiot: From Text to Databases: attribute grammar as database meta-model. CoRR abs/2410.09441 (2024)
  • Plácido A. Souza Neto:Predictive Query-based Pipeline for Graph Data. CoRR abs/2412.09940 (2024)
  • Silvia Federzoni, Anaïs Halftermeyer, Anne-Lyse Minard, Jean-Yves Antoine, Agata Savary. Annotation de relations de coréférence dans des cas cliniques : annotation manuelle et automatique de documents des corpus CAS et E3C, typologie, évaluation de systèmes existants. [to appear]