AIDAQ

Franco-Brazilian Joint Seminars

Adaptive, Intelligent Data Analysis & Querying

First AIDAQ Seminar

Seminar Program – 1 July 2026

Time zone: French Time / Brazilian Time
Format: Hybrid meeting

13:00 / 08:00

Welcome and Introduction (5 min)

The First AIDAQ Seminar brings together researchers from the French and Brazilian teams to share an overview of their current work and the problems they are addressing. The goal is to better understand each other’s research directions, identify common interests, and open the way to new scientific collaborations within the project.

Beyond individual presentations, the seminar also aims to foster a broader discussion on the relevance of the proposed methods to biological applications. In particular, we would like to explore how ideas developed in data analysis, querying, and learning may connect with problems in biology, and to what extent this application domain could guide or enrich future developments within AIDAQ. This discussion is also intended to initiate the preparation of a dedicated meeting with biologists planned for the end of the year, with the aim of identifying concrete directions for interdisciplinary collaboration.

Scientific Presentations

Format: 20 minutes presentation + 15 minutes discussion

13:05 / 08:05

Beyond Complexity Invariance: An Adaptive Multi-Scale Distance for Time Series Classification

Presenter: Fabricio Alves de Almeida

Work with Ana Lorena and Marcilio de Souto

View abstract

Time series classification under the 1-nearest neighbor (1-NN) paradigm is largely determined by the choice of distance measure. Complexity-based approaches adjust shape-only metrics by correcting for differences in structural complexity. One of the established strategies that considers this correction is the Complexity Invariant Distance (CID). This strategy corrects the Euclidean distance (ED) by a scalar factor derived from first-order differences. This correction is necessary when the behavior of the series differs; however, it may be insufficient when the complexity is specific and at different time scales. Based on this, we propose the Adaptive Multi-Scale Complexity Distance (AMCD), which measures structural divergence across Fibonacci-spaced lags. Using Fibonacci spacing allows for the creation of a near-logarithmic and non-redundant scale. The Fibonacci lags are determined from an adaptive subset K∗, classifying them through the per-lag variance of their complexity estimates on the training set. To demonstrate the method’s behavior, we used 80 univariate datasets from the UCR Time Series Archive. AMCD showed an average rank of performance of 1.43 compared to 2.06 for CID and 2.52 for ED. The Friedman test, Nemenyi post-hoc test, and Wilcoxon test confirm the statistically significant difference between all pairwise comparisons. The accuracy gain of the proposed method over ED increases with complexity, indicating the type of structural heterogeneity AMCD excels. These results provide empirical support for the claim that time series complexity is inherently multi-scale, and position AMCD as an extension of CID for structurally heterogeneous time series classification problems.

13:40 / 08:40

Extending Drop-DTW for clustering multivariate timed sequences

Presenter: Patrick Marcel

Work with Sophie Robert, Yousif Elias and Mostafa Bamha

View abstract

This work focuses on grouping timed sequences, i.e., sequences that are irregular, with non-identical temporal differences between two observations, and multivariate, where observations are detailed along multiple dimensions. A similarity between timed sequences, accounting for these particularities, is essential to achieve these groupings. Several alignment algorithms exist for this purpose, such as the classical Dynamic Time Warping (DTW) distance and its robust to outlier extension, Drop-DTW. Recently, Drop-DTW has been successfully extended to temporal sequences and applied to the grouping of care pathways. Our contribution extends Drop-DTW to multivariate sequences, with the goal of matching observations on some subset of the dimensions. One drawback of such a similarity measure lies in the number of parameters. Considering the downstream task, namely grouping sequences using clustering, we propose an approach to tune the parameters by optimizing the classical silhouette score with Bayesian optimization. Our tests on real datasets show the effectiveness of using this extension of Drop-DTW to cluster multivariate timed sequences.

14:15 / 09:15

An adaptive evolutionary multi-objective clustering approach based on the properties of the base partitions

Presenter: Marcilio de Souto

Work with Cristina Y. Morimoto and Aurora Pozo

View abstract

Evolutionary multi-objective clustering (EMOC) is a modern clustering technique in which the general concepts of evolutionary multi-objective optimization are applied to the clustering problem. Designing and defining clustering approaches remains a challenging task, particularly regarding the selection of objective functions and algorithm parameter settings. To better understand this field, we mapped and analyzed existing approaches and evaluated their main characteristics. Our analysis showed that a wide variety of objective functions and initialization strategies have been employed in EMOC approaches. Furthermore, we identified issues in the design of established algorithms that do not consider the impact of using high-quality base partitions during the search process when specific objective functions are applied. This limitation may restrict the clustering process or even degrade the quality of solutions obtained during initialization. To address this issue, we propose AEMOC (Adaptive Evolutionary Multi-Objective Clustering based on Data Properties). The proposed approach considers the characteristics of the base partitions to determine whether optimization is required. For this purpose, we introduce a metric to measure the degree of data separation, which estimates the relative quality of the initial population generated by minimum spanning tree clustering. In addition, this evaluation enables the offline selection of objective functions and parameter settings for the multi-objective algorithm. AEMOC achieved promising results on a diverse set of artificial and real-world datasets. Specifically, it successfully identified the relative quality of the base partitions and produced better clustering results than reference EMOC approaches.

14:50 / 09:50

Coffee Break (10 min)

15:00 / 10:00 — Talk 4

Toward a Flexible Multi-Platform Pipeline Enabling GPU Acceleration for Biological Sequence Alignment

Presenter: Sébastien Limet

Work with Sophie Robert and Maxime Pheulpin

View abstract

In biology, sequencing is a technique used to determine the base sequences that make up RNA or DNA molecules. Many biological studies rely on the analysis of these sequences, which are generally grouped under the term OMICS. In recent years, the rapid evolution of technologies has led to significant improvements in sequencing methods, particularly with the advent of Next-Generation Sequencing (NGS). In this context, the main challenge is no longer to generate sequence data, but rather to process them efficiently. OMICS analysis pipelines share very similar structures. The objective of our research is to provide a configurable framework that enables biologists to easily build their processing pipelines while making optimal use of the available computational resources. This presentation describes the foundations of this framework through a real-world transcriptomics application.

15:40 / 10:40 — Talk 4

Parallel Computing on GPUs: Parallel Building Blocks and Applied Algorithms

Presenter: Wagner M. Nunan Zola

View abstract

In this presentation, we will showcase our work in Parallel Computing on GPUs at UFPR, Brazil. The main focus of our research has been on general-purpose parallel algorithms and techniques that can be applied as building blocks for the efficient parallel processing of higher-level applications on GPUs. In this talk, we will briefly show how warp-centric GPU techniques have been used to accelerate three selected applications that are potentially related to the AIDAQ research project: SWW-TSNE: A high-performance GPU t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm for dimensionality reduction of large datasets using Simulated Wide Warps (SWW) Warp-Centric K-Nearest Neighbor Graphs Construction on GPUs A Fast Parallel K-Means Algorithm on GPUs Using Warp-Centric Strategies

16:10 / 11:10 — Talk 5

Lazy Prediction in Querying Graph Databases

Presenter: Mirian Halfeld Ferrari

Work with Lingchen Wang, Jacques Chabin, Martin Musicante and Cristina Dutra de Aguiar

View abstract

This work, conducted as part of Wang's PhD thesis, addresses querying and reasoning over incomplete graph databases. We investigate how symbolic reasoning and machine learning can be combined to infer, assess, and validate missing information in graph data. The first contribution proposes a query-driven completion-on-demand approach for evaluating regular path queries over incomplete graphs. By integrating link prediction into query processing, missing links are inferred dynamically according to the query context, improving query answers without requiring full graph completion. The second contribution introduces a hybrid framework that combines symbolic reasoning with machine-learning-based prediction to jointly address data incompleteness and consistency enforcement. We consider integrity constraints expressed as tuple-generating dependencies (TGDs). Since missing information may lead to apparent constraint violations, the proposed approach relaxes the enforcement of constraints by adjusting their strength and leverages link prediction to infer plausible missing information, thereby supporting constraint satisfaction in incomplete graphs. Together, these contributions aim to improve both query answering and data quality in incomplete graph databases.

Closing

16:50 / 11:50

General Discussion, Project Perspectives and Future Collaborations

17:30 / 12:30

End of Seminar

```