SENDUP – SEmantic Network of Data: Utility and Privacy

General Context

The amount of data produced by individuals and corporations has dramatically increased during the last decades. This generalized gathering of data brings opportunities (e.g., building new knowledge using this « Big Data ») but also new privacy challenges. The general public express a growing distrust over personal data exploitation, which has been met with successive strengthened regulations (e.g. EU general data protection regulation, GDPR). In the meantime, open data is taking a crucial place within many administrations. The open data policy is a powerful move by public institutions aiming at publishing data collected by public agent. The objective is to manage this data as an asset to make it available, discoverable, and usable by anyone. Both the US and the European Community have foundations to promote this policy. This leads to an important new societal challenge at the crossroads of these social evolutions: how can privacy be preserved while publishing useful data?

Objectives: Respecting privacy while querying and publishing graphs with underlying semantics

Nowadays, data is often represented in the form of graphs with underlying semantics to allow efficient querying and support inference engines. This is the case, for example, in the fields of linked data and the semantic web, relying typically on the RDF representation. While anonymization of tabular databases and untyped homogeneous graphs are well-researched, in 2018 anonymization of typed graphs with underlying semantics had not been studied much. It remains a challenge, both in terms of theoretical models that take semantics into account, and in terms of their practical implementation.

The SENDUP project has focused on such data representations and on RDF in particular. It aims to produce practical approaches, supported by formal theoretical models and implemented in a software suite, that guarantee privacy while providing useful information when databases in the form of typed graphs with underlying semantics are published or queried. To achieve this aim, the SENDUP project introduced new formalisms and techniques related to the sanitization and update of semantic data graphs.

Approach: Differential privacy and formal update management in semantic data graphs

SENDUP’s first aim is to enrich the state of the art on privacy preserving techniques in the databases under consideration. In order to offer formal privacy guarantees, we have proposed models based on differential privacy. These models take into account the heterogeneity of the vertices and the semantics of the relationships within the database. They allow to disregard non-sensitive information and therefore reduce data degradation inherent to anonymization.

Anonymising a database, especially for publication, involves its transformation. To enable the development of formal proofs and guarantees, we have adopted an approach that formalises these transformations as graph rewriting rules to support the definition and implementation of anonymization procedures.

In addition, these semantic graph databases may be subject to structural or integrity constraints and may be associated with inference rules. Updating them must therefore preserve these constraints and take these rules into account. We have coupled the formalisation of transformations with an approach for generating compensatory updates, where, for example, the deletion of a fact is accompanied by the deletion of facts that allow it to be inferred again.

Main results

The main results of SENDUP are:

A projection-based approach to reduce the data degradation of differentially private mechanisms on typed graphs.
GrAnon, an open-source engine for anonymization procedures on typed graphs. Based on a language of simple formal operators, it can be used to specify and execute procedures achieving local differential privacy or anatomisation, for example.
SETUP, a software package for managing RDF/S databases updates, preserving their integrity constraints and generating compensatory actions to ensure the application of atomic updates.

Factual information

The SENDUP project is a « young researcher » experimental development project coordinated by Cédric Eichler from LIFO. It involves the LIFO and LIG laboratories. The project started in November 2018 and lasted 5 years. The project received an ANR grant of €218,721 for an overall cost of around €590,000.