Séminaires | GT-PVP

Le séminaire du groupe de travail Protection de la Vie Privée du GDR Sécurité est un évènement périodique en ligne. Ce séminaire est à destination des membres de la communauté au sens large. Il a en particulier comme objectif de palier au manque de séminaires et de conférences causé par l’épidémie.

Appel à participation pour les prochaines itérations de ce séminaire :
– présentations longues (30 mins)
– présentations courtes (5 mins)

Propositions à envoyer à estelle.cherrier@ensicaen.fr et
mathieu.cunche@insa-lyon.fr.

The TRUMPET and FLUTE projects: Secure, privacy-preserving and scalable federated learning between hospitals with pilot studies in oncology – Jan Ramon (Inria Lille – MAGNET) — 09/11/23

Abstract: In this seminar, I’ll present our Horizon Europe projects TRUMPET and FLUTE. In medical domains, patient data is essential to learn statistical models. Often, patient data from multiple hospitals is needed to have sufficiently large training sets for accurate machine learning. Patient data is sensitive, and hence it is preferable to not let such data leave the secure premises of the hospital. In the TRUMPET and FLUTE projects, we research and develop a platform for secure, privacy-preserving federated machine learning between hospitals. TRUMPET aims at use cases in lung cancer and head and neck cancer, while FLUTE targets the specific use case of predicting the need for a biopsy from MRI prostate images of patients with suspected prostate cancer. In this presentation, I review the objectives of the projects, considerations from a GDPR point of view, the technical challenges we will tackle and a number of approaches we will consider.

Bio: Jan Ramon obtained a PhD in computer science from KU Leuven (Belgium) in 2002 on logic-based machine learning. From 2009 until 2015 he lead an ERC starting grant on data mining in graphs and networks. Since 2015 he is senior researcher in the MAGNET team in INRIA Lille (France). His main research interests concern machine learning theory, privacy-preserving AI, and applications thereof in among others the medical domain. He is member of the editorial boards of Machine Learning Journal, Data Mining and Knowledge Discovery and Transactions on Machine Learning Research.

Fairness in machine learning from the perspective of sociology of statistics – Bilel Benbouzid (LISIS – Université Gustave Eiffel) — 12/10/23

Abstract: We argue in this article that the integration of fairness into machine learning, or FairML, is a valuable exemplar of the politics of statistics and their ongoing transformations. Classically, statisticians sought to eliminate any trace of politics from their measurement tools. But data scientists who are developing predictive machines for social applications – are inevitably confronted with the problem of fairness. They thus face two difficult and often distinct types of demands: first, for reliable computational techniques, and second, for transparency, given the constructed, politically situated nature of quantification operations. We begin by socially localizing the formation of FairML as a field of research and describing the associated epistemological framework. We then examine how researchers simultaneously think the mathematical and social construction of approaches to machine learning, following controversies around fairness metrics and their status. Thirdly and finally, we show that FairML approaches tend towards a specific form of objectivity, “trained judgement,” which is based on a reasonably partial justification from the designer of the machine – which itself comes to be politically situated as a result.

Bio: Après une formation d’ingénieur des travaux publics de l’État (à l’ENTPE), Bilel Benbouzid
soutient une thèse de doctorat en sociologie des sciences sur les technologies de lutte
contre le crime. Maitre de conférence à l’Université Gustave Eiffel depuis 2013, il conduit une recherche sur la police prédictive où il s’est intéressé aux questions de discrimination et d’équité
dans les systèmes algorithmiques d’aide à la décision. Il rédige actuellement une HDR sur la régulation de l’IA.

Secure aggregation based on cryptographic schemes for federated learning – Melek Önen (EURECOM) — 23/02/23

Abstract: Secure aggregation consists of computing the sum of data collected from multiple sources without disclosing the individual inputs and has been found useful for various applications ranging from electronic voting to smart grid measurements and more recently, federated learning. This latter technology emerged as a new collaborative machine learning technology whereby multiple parties holding private data contribute to the training of a global machine learning model. In this talk, we will study the suitability of secure aggregation based on cryptographic schemes to federated learning. We will present the specific challenges raised by federated learning and further overview some recent solutions targeting some of these challenges namely, scalability, security and robustness.

Bio: Melek Önen is an associate professor in the Digital Security department at EURECOM (Sophia-Antipolis, France). Her research interests are applied cryptography, information security and privacy. She holds a PhD in Computer Science from Ecole Nationale Supérieure des Télécommunications de Paris (ENST, 2005) and obtained her « Habilitation à Diriger les Recherches » in 2017. She was/is involved in many European and national French research projects.

Browser Fingerprinting for Web Authentication : Towards an Additional Lightweight Authentication Factor – Tristan Allard (Univ Rennes, CNRS, IRISA) — 19/01/23 14:00

Abstract: Browser fingerprinting consists in collecting information from web browsers in order to build a – possibly unique – fingerprint per browser. Browser fingerprints can be made of hundreds of attributes whose values depend on the web environment of users. Recent works aim to leverage browser fingerprinting for using it as an additional lightweight authentication factor. In this talk, I will present a suite of recent works in this track. First, we performed an in-depth empirical study of the space of browser fingerprints in order to assess their adequacy as an authentication factor. We identified and formalized the properties for browser fingerprints to be usable and practical as an authentication factor (distinctiveness, stability, collection time, size), and assessed them on a large-scale dataset. Second, we proposed FPSelect, an attribute selection framework allowing to tune a browser fingerprinting probe for web authentication by reducing the collection costs while limiting the impact of dictionary attackers. We formalized the problem, showed that it is NP-Hard, and proposed an efficient heuristic for solving it. We performed a thorough experimental evaluation based on our real-life dataset and observed that, on average, compared with the common baselines, FPSelect generated fingerprints that are both orders of magnitude more efficient to collect and much more stable. Third, we implemented a browser fingerprinting attribute selection tool called BrFast. BrFast embeds FPSelect together with the common baselines, and can easily be extended for integrating additional attribute selection methods. BrFast is available online (https://github.com/tandriamil/BrFAST). This is joint work with Nampoina Andriamilanto, Gaëtan Le Guelvouit, and Alexandre Garel.

Bio: Tristan is an assistant professor (« maître de conférences ») since September 2014 at Univ Rennes, CNRS, Irisa. Before that, he was a postdoctoral researcher at the Inria Zenith team in Montpellier. He conducted his Ph.D. thesis in Computer Science in the Inria SMIS team and received it from the University of Versailles in December 2011. The volume, variety, and velocity of digital personal data are increasing at a fast pace. Enabling both daily uses and large-scale analysis of personal data while preserving individuals’ privacy is a key challenge in building a knowledge society. Tristan’s research interests lie within this wide field. He is particularly interested in the combination of differential privacy with cryptography (privacy-preserving data querying, privacy-preserving crowdsourcing, privacy-preserving data mining). And recently he got diverted by the study of browser fingerprints for web authentication.

Speech anonymization, Emmanuel Vincent (MULTISPEECH, Inria Nancy – Grand Est) — 24/11/22 14:00

Large-scale collection, storage, and processing of speech data poses severe privacy threats. Indeed, speech encapsulates a wealth of personal data (e.g., age and gender, ethnic origin, personality traits, health and socio-economic status, etc.) which can be linked to the speaker’s identity via metadata or via automatic speaker recognition. Speech data may also be used for voice spoofing using voice cloning software. With firm backing by privacy legislations such as the European general data protection regulation (GDPR), several initiatives are emerging to develop privacy preservation solutions for speech technology. This talk focuses on voice anonymization, that is the task of concealing the speaker’s voice identity without degrading the utility of the data for downstream tasks. I will i) explain how to assess privacy and utility, ii) describe the two baselines of the VoicePrivacy 2020 and 2022 Challenges and complementary methods based on adversarial learning, differential privacy, or slicing, and iii) conclude by stating open questions for future research.

Bio: Emmanuel Vincent received the Ph.D. degree in music signal processing from Ircam in 2004 and joined Inria in 2006. He is currently a Senior Research Scientist and the Head of Science of Inria Nancy – Grand Est. His research covers several speech and audio processing tasks, with a focus on privacy preservation, learning from little or no labeled data, source separation and speech enhancement, and robust speech and speaker recognition. He is a founder of the MIREX, SiSEC, CHiME, and VoicePrivacy challenge series. He is a scientific advisor of the startup company Nijta, which provides speech anonymization solutions.

Protocoles localement différentiellement privés pour l’estimation de la fréquence des données longitudinales, Héber H. Arcolezi (Inria Comète, LIX, École Polytechnique) — 27/10/22 14:00

La collecte et l’analyse de données longitudinales évolutives sont devenues une pratique courante. Une approche possible pour protéger la vie privée des utilisateurs dans ce contexte consiste à utiliser des protocoles de confidentialité différentielle locale (LDP), qui garantissent la protection de la vie privée de tous les utilisateurs, même en cas de fuite ou de mauvaise utilisation des données. Les protocoles de collecte de données LDP existants, tels que RAPPOR de Google et dBitFlipPM de Microsoft, ont une confidentialité longitudinale linéaire à la taille du domaine k, ce qui peut être excessif pour les grands domaines, tels que les domaines Internet. Pour résoudre ce problème, nous présentons dans cet article un nouveau protocole de collecte de données LDP pour le suivi longitudinal des fréquences, appelé LOngitudinal LOcal HAshing (LOLOHA), avec des garanties formelles de confidentialité. En outre, le compromis vie privée-utilité de notre protocole n’est que linéaire par rapport à une taille de domaine réduite 2≤g≪k. LOLOHA combine une approche de réduction de domaine via un hachage local avec une double randomisation pour minimiser la fuite de confidentialité encourue par les mises à jour de données. Comme le démontre notre analyse théorique ainsi que notre évaluation expérimentale, LOLOHA atteint une utilité compétitive par rapport aux protocoles actuels de l’état de l’art, tout en minimisant substantiellement la consommation du budget de confidentialité longitudinal jusqu’à k/g ordres de grandeur.

Bio : Héber H. Arcolezi est actuellement chercheur postdoctoral dans l’équipe Comète de l’Inria Saclay. Il étudie la confidentialité différentielle locale pour les données multidimensionnelles et longitudinales (ou évolutives). Il s’intéresse également aux questions d’équité et de confidentialité dans l’apprentissage automatique.

Inférence d’informations sensibles dans l’apprentissage automatique et contre-mesures, Antoine Boutet (INSA-Lyon/Inria) — 29/09/22 14:00

L’apprentissage automatique (ML) est devenu une technologie de base pour fournir des modèles d’apprentissage permettant d’effectuer des tâches complexes et le nombre d’applications reposant sur les capacités du ML ne cesse d’augmenter. Cependant, les modèles de ML sont la source de différentes violations de la vie privée par le biais d’attaques d’inférence. Dans cette présentation, je vais présenter plusieurs études que l’on a récemment menées dans le cadre de la confidentialité des informations personnelles en lien avec le ML, notamment une méthode d’assainissement des données basée sur un modèle génératif de type GAN, de nouveaux schémas d’apprentissage fédéré, et une nouvelle attaque d’inférence d’attribut sensible.

Bio: Antoine Boutet est maître de conférence à l’Insa de Lyon, membre de l’équipe Inria Privatics. Son travail porte sur les mécanismes de protection de la vie privée dans divers champs d’applications (IoT, IA, santé, …). Il a obtenu sa thèse à l’Inria Rennes en 2013 en lien avec les systèmes décentralisés et les systèmes de recommandation.

Behind the Anonymity in Distributed Ledgers, Nesrine Kaaniche (Telecom SudParis, SAMOVAR) — 28/04/22 14:00

This talk aims to stress the tension existing between anti-money laundering and data protection requirements applied to “private-ledgers”. On one hand, anti-money laundering regulation requires service providers to be able to identify their clients and track their transactions while, on the other hand, the implementation of data protection requirements strongly induces the use of anonymization-techniques to prevent the permanent recording of personal data (public/private keys and transactional data) within the Distributed Ledger Technology (DLT). In this context, designing DLTcs, which are able to meet both requirements, pose certain challenges. I will first introduce privacy-preserving technologies that are proposed to enforce privacy in distributed ledgers. Then, I will focus on de-anonymization while introducing a novel label graph networks to improve identification results. Finally, I will discuss auditing mechanisms to comply with laws and regulations.

Bio: Nesrine Kaaniche is an Associate Professor in Cybersecurity at Télécom SudParis, Polytechnic Institute of Paris and an associate active member of the interdisciplinary chair Values and Policies of Personal Information of Institute Mines Télécom, France. Previously, she was a lecturer in Cybersecurity at the Department of Computer Science, the University of Sheffield, UK, a Post-Doc researcher at Télécom SudParis, France and an International Fellow at SRI International, San Francisco, CA, USA. Her major research interests include privacy enhancing technologies, applied cryptography for distributed systems, and decentralized architectures, i.e., IoT, fog and clouds.

Latest Advances in Location Privacy Attacks and Protection Mechanisms, Sonia Ben Mokhtar (CNRS & LIRIS) — 03/03/22 14:00

The widespread adoption of continuously connected smartphones and tablets drove the proliferation of mobile applications, among which many use location to provide a geolocated service. The usefulness of these services is no more to be demonstrated; getting directions to work in the morning, leaving a check-in at a restaurant at noon and checking next day’s weather in the evening is possible from any mobile device embedding a GPS chip. In these applications, locations are sent to a server often hosted on untrusted cloud platforms, which uses them to provide personalized answers. However, nothing prevents these platforms from gathering, analyzing and possibly sharing the collected information. This opens the door for many threats, as location information allows to infer sensitive information about users, among which one’s home, workplace or even religious/political preferences. For this reason, many schemes have been proposed these last years to enhance location privacy while still allowing people to enjoy geolocated services. During this presentation, I will present the latest advances in location privacy attacks and protection mechanisms and give some insights on open challenges and under-explored questions.

Bio: Sonia Ben Mokhtar is a CNRS research director at the LIRIS laboratory (UMR 5205) and the head of the distributed systems and information retrieval group (DRIM). She received her PhD in 2007 from Université Pierre et Marie Curie before spending two years at University College London (UK). Her research focuses on the design of resilient and privacy-preserving distributed systems. Sonia has co-authored 70+ papers in peer-reviewed conferences and journals and has served on the editorial board of IEEE Transactions on Dependable and Secure Computing and co-chaired major conferences in the field of distributed systems (e.g., ACM Middleware, IEEE DSN). Sonia has served as chair of ACM SIGOPS France and is currently the vice-chair of GDR RSD a national academic network of researchers in distributed systems and networks.

Towards safe online political advertising. — Oana Goga (LIG, CNRS) — 16/12/2021 14:00

Abstract : In this presentation I will talk about our paper “Facebook Ads Monitor: An Independent Auditing System for Political Ads on Facebook” published at The Web Conference 2020 and followup discussions with civil societies on how to regulate political advertising: https://epd.eu/wp-content/uploads/2020/09/joint-call-for-universal-ads-transparency.pdf.

The 2016 United States presidential election was marked by the abuse of targeted advertising on Facebook. Concerned with the risk of the same kind of abuse to happen in the 2018 Brazilian elections, we designed and deployed an independent auditing system to monitor political ads on Facebook in Brazil. To do that we first adapted a browser plugin to gather ads from the timeline of volunteers using Facebook. We managed to convince more than 2000 volunteers to help our project and install our tool. Then, we use a Convolution Neural Network (CNN) to detect political Facebook ads using word embeddings. To evaluate our approach, we manually label a data collection of 10k ads as political or non-political and then we provide an in-depth evaluation of proposed approach for identifying political ads by comparing it with classic supervised machine learning methods. Finally, we deployed a real system that shows the ads identified as related to politics. We noticed that not all political ads we detected were present in the Facebook Ad Library for political ads. Our results emphasize the importance of enforcement mechanisms for declaring political ads and the need for independent auditing platforms.

Bio : Oana Goga is a tenured research scientist at the French National Center for Scientific Research (CNRS) and the Laboratoire d’Informatique Grenoble (LIG). She investigates how social media systems and online advertising can be used to impact humans and society negatively. She is the recipient of a young researcher award from the French National Research Agency (ANR). Her recent research received several awards, among which the Honorable Mention Award at The Web Conference in 2020, the CNIL-Inria Award for Privacy Protection 2020 and was runner-up for the 2019 Caspar Bowden PET Award for outstanding research in privacy enhancing technologies.

Growing synthetic data through differentially-private vine copulas — Sébastien Gambs (UQAM) — 14/10/21 14:00

Abstract: In this work, we propose a novel approach for the synthetization of data based on copulas, which are interpretable and robust models, extensively used in the actuarial domain. More precisely, our method COPULA-SHIRLEY is based on the differentially-private training of vine copulas, which are a family of copulas allowing to model and generate data of arbitrary dimensions. The framework of COPULA-SHIRLEY is simple yet flexible, as it can be applied to many types of data while preserving the utility as demonstrated by experiments conducted on real datasets. We also evaluate the protection level of our data synthesis method through a membership inference attack recently proposed in the literature. Joint work with Frédéric Ladouceur, Antoine Laurent, Alexandre Roy-Gaumond.

Biography: Sébastien Gambs has joined the Computer Science Department of the Université du Québec à Montréal (UQAM) in January 2016, after having held a joint Research chair in Security of Information Systems between Université de Rennes 1 and Inria from September 2009 to December 2015. He currently holds the Canada Research Chair (Tier 2) in Privacy-preserving and Ethical Analysis of Big Data since December 2017. His main research area is the Protection of Privacy, with a particular strong focus on location privacy. He is also interested to solve long-term scientific questions such as addressing the tension between privacy and the analysis of Big Data as well as the fairness, accountability and transparency issues raised by personalized systems.

RETEX Data Anonymization and Reidentification Contest@APVP2021 — Margaux Tela (pour l’équipe UQAM), Nancy Awad (pour l’équipe Femtorange), Julien Bracon (pour l’équipe INSA Lyon) — 13/07/2021

Chacune des trois équipes présentera sa solution d’anonymisation et de réidentification, telle qu’elle a été soumise lors de la compétition DARC@APVP2021

Responsible data publishing during the COVID-19 crisis — Damien Desfontaines (Google) — 10/06/2021 14:00

Abstract : In this talk, I will present two projects that Google
launched to help public health officials combat the spread of
COVID-19: the COVID-19 Community Mobility Reports, and the COVID-19 Search Trends Symptoms dataset. In both projects, we aggregated and anonymized the data using differential privacy. Taking these launches as an example, I will outline some of the challenges that appear when rolling out differential privacy for practical use cases, and present possible approaches to tackling these challenges.

Bio : Damien Desfontaines leads the anonymization consulting team at Google, where he spent the past few years rolling out differential
privacy for a variety of use cases. He obtained his PhD, also on
differential privacy, in 2020 at ETH Zürich.

Personal Database Management Systems (PDMS) : vers une plateforme de Big Data citoyen ? — Nicolas Anciaux (Inria Saclay Île-de-France – UVSQ – PETRUS) — 20/05/2021 13:00

Abstract: Les initiatives de smart disclosure aux Etats-Unis et le RGPD en Europe accroissent l’intérêt pour les systèmes personnels de gestion de données (appelés PIMS ou PDMS) fournis aux individus afin de gérer leurs données sous contrôle. L’épineuse question de la protection des données personnelles est ainsi mise en exergue, dans un contexte qui diffère notablement du cas traditionnel des bases de données d’entreprises externalisées sur le cloud. Les propriétés à assurer sont spécifiques et difficiles à atteindre, mais l’émergence d’environnements d’exécution de confiance (comme Intel SGX ou ARM Trustzone) présents aujourd’hui dans la plupart des dispositifs utilisateurs pourrait changer la situation.
Le paradigme du PDMS a pour objectif de concilier protection des données personnelles et traitements avancés, avec ces technologies. Cette présentation sera l’occasion (1) de passer en revue les solutions de PDMS, leurs fonctionnalités et modèles de confiance, et l’apport potentiel des environnements d’exécution de confiance, et (2) de discuter de nouvelles solutions pour le traitement collectif de données personnelles (portabilité citoyenne), préservant à la fois l’agentivité des individus et les intérêts sociétaux liés au partage de données personnelles.

Bio: Nicolas Anciaux est Directeur de Recherche Inria, responsable de l’équipe PETRUS, commune avec l’Université de Versailles. Ses domaines d’expertise sont les aspects systèmes des bases de données et la confidentialité des données. Au sein de l’équipe PETRUS, il applique son domaine d’expertise aux systèmes personnels de bases de données (PDMS). Il est co-auteur de PlugDB, un PDMS sécurisé pour l’embarqué utilisé dans le suivi des soins à domicile. Il co-dirige avec Celia Zolynski, Professeur à l’Ecole de Droit de la Sorbonne, le projet GDP-ERE visant à co-construire un cadre technico-juridique pour la gestion des données personnelles par les citoyens. Nicolas est éditeur associé du VLDB Journal et co-auteur de plus de 50 articles de conférences et de revues scientifiques.

Privacy-Preserving Decentralized Machine Learning — Aurélien Bellet (Inria Lille Nord Europe – Magnet) — 18/03/2021 14:00

Abstract: Decentralized machine learning (DML), also known as federated learning, is a setting where many parties (e.g., mobile devices or whole organizations) collaboratively train a machine learning model while keeping their data decentralized. In this talk, I will give a brief introduction to DML and emphasize that most algorithms rely on aggregating local model updates made by participants. I will then show how differential privacy can be integrated into these algorithms to ensure data confidentiality, and discuss how to obtain good trade-offs between privacy, utility and computational costs.

Bio: Aurélien Bellet is a tenured researcher at Inria (France). He obtained his Ph.D. from the University of Saint-Etienne (France) in 2012 and was a postdoctoral researcher at the University of Southern California (USA) and at Télécom Paris (France). His current research focuses on the design of federated and decentralized machine learning algorithms under privacy constraints. Aurélien served as area chair for ICML 2019, ICML 2020 and NeurIPS 2020, and co-organized several international workshops on machine learning and privacy (at NIPS’16, NeurIPS’18 ’20 and as stand-alone events). He was also a co-organizer of the 10th edition of the French pluridisciplinary conference on privacy protection (APVP) in 2019.

[Présentation courte] The Cluster Exposure Verification (CLÉA) Protocol — Vincent Roca (Inria Grenoble – Privatics) — 18/03/2021 14:00

Abstract: In this talk, I will give a brief introduction to the Cluster Exposure Verification (CLÉA) protocol, meant to warn the participants of a private event (e.g., wedding or private party) or the persons present in a commercial or public location (e.g., bar, restaurant, or train) that became a cluster because people who were present at the same time have later been tested COVID+. This protocol is the foundation of a dedicated TousAntiCovid module that will offer an additional and complementary service to the existing contact tracing module.

Bio: After a PhD from Grenoble INP in 1996, Vincent Roca joins the University Paris 6 as Associate Professor in 1997, and Inria as researcher in 2000. Active IETF (Internet Engineering Task Force) participant, member of PRIVATICS since 2012, he is now leading this Inria research team specialised in privacy and personal data protection. He focusses in particular on the privacy risks associated to the use of smartphones and Internet of Things devices. He is also co-author, with PRIVATICS colleagues, of the ROBERT Covid exposure notification protocol that is the foundation of the French
TousAntiCovid app.

Hybrid Differential Privacy — Catuscia Palamidessi (Inria Saclay – Comète) — 25/02/2021 14:00

Abstract: Differential Privacy (DP) is one of the most successful proposal to protect the privacy of the sensitive data while preserving their utility. In this talk, we will briefly introduce the DP frameworks and its central and local models, which refer to the cases in which sanitization is done after the data has been collected, or at the level of the individual data, respectively.

We present an intermediate scenario, which we call hybrid, representing the case in which the data set is distributed across different organizations, which do not wish to disclose the original data but only their sanitized version, and still benefit from the advantages of combining the information coming from different sources. We propose a new mechanism for the hybrid case, which is compositional and particularly suitable for the application of a variant of the statistical Expectation-Maximization method, thanks to which the utility of the original data can be retrieved to an arbritrary degree of approximation, without affecting the privacy of the original data owners.

Detecting online tracking and GDPR violations in Web applications — Nataliia Bielova (Inria Sophia Antipolis, Privatics) –17/12/20 14:00

Abstract: As millions of users browse the Web on a daily basis, they become producers of data that are continuously collected by numerous companies and agencies. Website owners, however, need to become compliant with recent EU privacy regulations (such as GDPR and ePrivacy) and often rely on cookie banners to either inform users or collect their consent to tracking.

In this talk, I will present recent results on detecting Web trackers and analyzing compliance of websites with GDPR and ePrivacy directive. We first develop a tracking detection methodology based on invisible pixels. By analyzing the third-party resource loading on 80K webpages, we uncover hidden collaborations between third parties and find that 68% of websites synchronize harmless firs-party cookies with privacy-invasive third-party cookies. We show that filter lists, used in the research community as a de facto approach to detect trackers, miss between 25% and 30% of cookie-based tracking we detect. Finally, we demonstrate that privacy-protecting browser extensions, such as Ghostery, Disconnect or Privacy Badger together miss 24% of tracking requests we detect.

To measure legal compliance of websites, we analyse cookie banners that are implemented by Consent Management Providers (CMPs), who respect the IAB Europe’s Transparency and Consent Framework (TCF). Via cookie banners, CMPs collect and disseminate user consent to third parties. We systematically study IAB Europe’s TCF and analyze consent stored behind the user interface of TCF cookie banners. We analyze the GDPR and the ePrivacy Directive to identify legal violations in implementations of cookie banners based on the storage of consent and detect such violations by crawling 23K European websites, and further analyzing 560 websites that rely on TCF. As a result, we find violations in 54% of them: 175 (12.3%) websites register positive consent even if the user has not made their choice; 236 (46.5%) websites nudge the users towards accepting consent by pre-selecting options; and 39 (7.7%) websites store a positive consent even if the user has explicitly opted out. Finally, we provide a browser extension, Cookie glasses, to facilitate manual detection of violations for regular users and Data Protection Authorities.

Bio: Nataliia Bielova is a Research Scientist at Privatics team in Inria Sophia Antipolis, where she started an interdisciplinary research in Computer Science and EU Data Protection Law. Her main research interests are measurement, detection and protection from Web tracking. She also collaborates with Law researchers to understand how GDPR and ePrivacy Regulation can be enforced in Web applications.