DOING@2ndMADICS-SYMPOSIUM

OUR PROGRAM

14h00Introduction
14h10Automatically understand and process the meaning of information at scale, Andre Freitas

In this talk, we will provide a synthesis of some of the detectable trends in NLP beyond deep learning. We will focus on the current transition from deep learning to deep semantics, describing the rich interface between neural and explicit knowledge representation and the emerging architectural patterns which will deliver AI systems which are capable of deeper inference, better generalisation and are natively explainable.
 
André Freitas is a lecturer (assistant professor) at the School of Computer Science at the University of Manchester (United Kingdom), where he leads the AI Systems Lab. His main research areas include Question Answering Systems, Applied Machine Learning, Natural Language Inference, Neuro-Symbolic Representations, Knowledge-based AI Systems, Explainable AI, Open Information Extraction, and Knowledge Graphs. 
 
15h00Break
15h20Data Cleaning and Preparation for ML and Data Analytics: Toward a Principled Approach, Laure BertiÉquille

This talk will present previous and recent contributions in data curation, as one of the most critical tasks that can affect result quality and robustness of machine learning and data analytics pipelines. First, discovering patterns of errors is important because it may change the data pre-processing strategy: we need solutions for handling anomalies in isolation as well as handling intricate glitches in a principled and declarative way. As a first step in this direction, we present MeSQuaL a system for declarative data quality profiling. Second, different orderings in the sequence of tasks for cleaning and pre-processing the data may lead to dramatically different pre-processed data sets, and ultimately different ML or analytics results. However, it is essential to keep track of and evaluate the candidate data transformation pipelines, to provide comparative analysis and explanations to the users, and recommend the optimal data pre-processing strategy. In this line, we have used reinforcement learning and developed Learn2Clean, a system that selects, for a given data set, ML model, and quality performance metric, the optimal sequence of tasks for pre-processing the data such that the quality metric is maximized. Finally, the talk will conclude and discuss some challenging research directions at the intersection of machine learning and data management for orchestrating seamlessly automated and Human-in-the-Loop (HIL) tasks for optimal data cleaning.

Laure Berti-Équille is a Research Director at IRD, the French research institute for sustainable development. Before, she was a full professor at Aix-Marseille University (AMU), a senior scientist at Qatar Computing Research Institute (Hamad Bin Khalifa University, Qatar), an Associate Professor at University of Rennes 1 (France), and visiting researcher at AT&T Labs Research (USA) as a recipient of the prestigious European Marie Curie Outgoing Fellowship. Her interests are at the intersection of large-scale data analytics, and statistical machine learning with a focus on data quality, anomaly detection, and truth discovery, with more than 80 publications and three monographs. She initiated the very first workshop editions on information and data quality in information systems (IQIS 2005) and quality in databases (QDB 2009 and 2016) in conjunction with SIGMOD and VLDB respectively and co-organized the first French workshops on Data and Knowledge Quality in conjunction with EGC (Extraction et Gestion de Connaissances) in 2005, 2006, 2010, and 2011. She has received various grants from the French Agency for National Research (ANR), the French National Research Council (CNRS), and the European Union.
 
16h10Discussions: DOING has organised two webinars. Presentations have considered problems and solutions around the information extraction from texts, together with the evolution of database querying and analysis in a declarative way, ensure consistency and data quality. We are interested in listing the challenges and innovations in these two fields.
17h00Closure

LINKS