2024

S-BDT: Distributed Differentially Private Boosted Decision Trees
ACM Conference on Computer and Communications Security (CCS 2024)
Thorsten Peinemann, Moritz Kirschte, Joshua Stock, Carlos Cotrini, Esfandiar Mohammadi
We introduce S-BDT: a novel (ε,δ)-differentially private distributed gradient boosted decision tree (GBDT) learner that improves the protection of single training data points (privacy) while achieving meaningful learning goals, such as accuracy or regression error (utility). S-BDT uses less noise by relying on non-spherical multivariate Gaussian noise, for which we show tight subsampling bounds for privacy amplification and incorporate that into a Rényi filter for individual privacy accounting. We experimentally reach the same utility while saving 50% in terms of epsilon for ε≤0.5 on the Abalone regression dataset (dataset size ≈4K), saving 30% in terms of epsilon for ε≤0.08 for the Adult classification dataset (dataset size ≈50K), and saving 30% in terms of epsilon for ε≤0.03 for the Spambase classification dataset (dataset size ≈5K). Moreover, we show that for situations where a GBDT is learning a stream of data that originates from different subpopulations (non-IID), S-BDT improves the saving of epsilon even further.
Automated Large-Scale Analysis of Cookie Notice Compliance
USENIX Security Symposium 2024
Ahmed Bouhoula, Karel Kubicek, Amit Zac, Carlos Cotrini, David A. Basin
Privacy regulations such as the General Data Protection Regulation (GDPR) require websites to inform EU-based users about non-essential data collection and to request their consent to this practice. Previous studies have documented widespread violations of these regulations. However, these studies provide a limited view of the general compliance picture: they are either restricted to a subset of notice types, detect only simple violations using prescribed patterns, or analyze notices manually. Thus, they are restricted both in their scope and in their ability to analyze violations at scale.

We present the first general, automated, large-scale analysis of cookie notice compliance. Our method interacts with cookie notices, e.g., by navigating through their settings. It observes declared processing purposes and available consent options using Natural Language Processing and compares them to the actual use of cookies. By virtue of the generality and scale of our analysis, we correct for the selection bias present in previous studies focusing on specific Consent Management Platforms (CMP). We also provide a more general view of the overall compliance picture using a set of 97k websites popular in the EU. We report, in particular, that 65.4% of websites offering a cookie rejection option likely collect user data despite explicit negative consent.

2023

Locality-Sensitive Hashing Does Not Guarantee Privacy! Attacks on Google's FLoC and the MinHash Hierarchy System
Proceedings on Privacy Enhancing Technologies (PoPETS 2023)
Florian Turati, Karel Kubicek, Carlos Cotrini, David A. Basin
Recently proposed systems aim at achieving privacy using locality-sensitive hashing. We show how these approaches fail by presenting attacks against two such systems: Google's FLoC proposal for privacy-preserving targeted advertising and the MinHash Hierarchy, a system for processing mobile users' traffic behavior in a privacy-preserving way. Our attacks refute the pre-image resistance, anonymity, and privacy guarantees claimed for these systems.

In the case of FLoC, we show how to deanonymize users using Sybil attacks and to reconstruct 10% or more of the browsing history for 30% of its users using Generative Adversarial Networks. We achieve this only analyzing the hashes used by FLoC. For MinHash, we precisely identify the movement of a subset of individuals and, on average, we can limit users' movement to just 10% of the possible geographic area, again using just the hashes. In addition, we refute their differential privacy claims.
Domain Generalization for Diagnosis of Pulmonary Fibrosis Using Dose-Invariant Feature Selection
IEEE International Symposium on Biomedical Imaging (ISBI 2023)
João B. S. Carvalho, Carlos Cotrini, Fabian Laumer, André Euler, Katharina Martini, Thomas Frauenfelder, Joachim M. Buhmann
Automated methods for diagnosing pulmonary fibrosis based on deep learning have achieved promising results in recent times. However, their accuracy depends on the radiation dose used to generate the CT scans and the performance of popular models decreases when evaluated on CT scans with doses different than those of CT scans used for training. We propose a new method for ensuring that the representations computed by these networks are invariant to the dose, without retraining the entire network. Our method improves upon the F1 score of standard methods by 6% to 15% when evaluated on unseen samples recorded with a different radiation dose.
Invariant Anomaly Detection under Distribution Shifts: A Causal Perspective
Conference on Neural Information Processing Systems (NeurIPS 2023)
João B. S. Carvalho, Mengtao Zhang, Robin Geyer, Carlos Cotrini, Joachim M. Buhmann
Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples by solely relying on the consistency of the normal training samples. Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down. In this work, by leveraging tools from causal inference we attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts. We begin by elucidating a simple yet necessary statistical property that ensures invariant representations, which is critical for robust AD under both domain and covariate shifts. From this property, we derive a regularization term which, when minimized, leads to partial distribution invariance across environments. Through extensive experimental evaluation on both synthetic and real-world tasks, covering a range of six different AD methods, we demonstrated significant improvements in out-of-distribution performance. Under both covariate and domain shift, models regularized with our proposed term showed marked increased robustness.

2022

Holistic Modeling in Medical Image Segmentation Using Spatial Recurrence
Medical Imaging with Deep Learning (MIDL 2022)
Joao B. S. Carvalho, Joao Santinha, Djordje Miladinovic, Carlos Cotrini, Joachim M. Buhmann
In clinical practice, regions of interest in medical imaging (MI) often need to be identified through a process of precise image segmentation. For MI segmentation to generalize, we need two components: to identify local descriptions, but at the same time to develop a holistic representation of the image that captures long-range spatial dependencies. Unfortunately, we demonstrate that the state of the art does not achieve the latter. In particular, it does not provide a modeling that yields a global, contextual model. To improve accuracy, and enable holistic modeling, we introduce a novel deep neural network architecture endowed with spatial recurrence. The implementation relies on gated recurrent units that directionally traverse the feature map, greatly increasing each layers receptive field and explicitly modeling non-adjacent relationships between pixels.
Checking Websites' GDPR Consent Compliance for Marketing Emails
Proceedings on Privacy Enhancing Technologies (PETS 2022)
Karel Kubicek, Jakob Merane, Carlos Cotrini, Alexander Stremitzer, Stefan Bechtold, David Basin
The sending of marketing emails is regulated to protect users from unsolicited emails. For instance, the European Union's ePrivacy Directive states that marketers must obtain users' prior consent, and the General Data Protection Regulation (GDPR) specifies further that such consent must be freely given, specific, informed, and unambiguous. Based on these requirements, we design a labeling of legal characteristics for websites and emails. This leads to a simple decision procedure that detects potential legal violations. Using our procedure, we evaluated 1000 websites and the 5000 emails resulting from registering to these websites. We find that 21.9% of the websites contain potential violations of privacy and unfair competition rules, either in the registration process (17.3%) or email communication (17.7%).
Automating Cookie Consent and GDPR Violation Detection
USENIX Security Symposium 2022 (Distinguished Artifact Award)
Dino Bollinger, Karel Kubicek, Carlos Cotrini, David Basin
The European Union's General Data Protection Regulation (GDPR) requires websites to inform users about personal data collection and request consent for cookies. Yet the majority of websites do not give users any choices, and others attempt to deceive them into accepting all cookies. We document the severity of this situation through an analysis of potential GDPR violations in cookie banners in almost 30k websites. We identify six novel violation types, such as incorrect category assignments and misleading expiration times, and we find at least one potential violation in a surprising 94.7% of the analyzed websites. We address this issue by giving users the power to protect their privacy through CookieBlock, a browser extension that uses machine learning to enforce GDPR cookie consent at the client.
Studierende auf den Einsatz von maschinellem Lernen vorbereiten
Schweizerische Ärztezeitung 2022
Raphaël Bonvin, Joachim Buhmann, Carlos Cotrini Jimenez, Marcel Egger, Alexander Geissler, Michael Krauthammer, Christian Schirlo, Christiane Spiess, Johann Steurer, Kerstin Noëlle Vokinger, Julia Vogt
Die Digitalisierung hat die Medizin bereits verändert und wird die ärztliche Tätigkeit auch in Zukunft stark beeinflussen. Es ist deshalb wichtig, dass sich angehende Ärztinnen und Ärzte bereits während des Studiums mit den Methoden und Einsatzmöglichkeiten des maschinellen Lernens auseinandersetzen. Die Arbeitsgruppe «Digitalisierung der Medizin» hat dazu Lernziele erarbeitet.

2019

The Next 700 Policy Miners: A Universal Method to Build Policy Miners
ACM Conference on Computer and Communications Security (CCS 2019)
Carlos Cotrini, Luca Corinzia, Thilo Weghorn, David Basin
A myriad of access control policy languages have been and continue to be proposed. The design of policy miners for each such language is a challenging task that has required specialized machine learning and combinatorial algorithms. We present an alternative method, universal access control policy mining (Unicorn). We show how this method streamlines the design of policy miners for a wide variety of policy languages including ABAC, RBAC, RBAC with user-attribute constraints, RBAC with spatio-temporal constraints, and an expressive fragment of XACML. For the latter two, there were no known policy miners until now.

2018

Mining ABAC Rules from Sparse Logs
IEEE European Symposium on Security and Privacy (EuroS&P 2018)
Carlos Cotrini, Thilo Weghorn, David Basin
Different methods have been proposed to mine attribute-based access control (ABAC) rules from logs. In practice, these logs are sparse in that they contain only a fraction of all possible requests. However, for sparse logs, existing methods mine and validate overly permissive rules, enabling privilege abuse. We define a novel measure, reliability, that quantifies how overly permissive a rule is and we show why other standard measures like confidence and entropy fail in quantifying overpermissiveness. We build upon state-of-the-art subgroup discovery algorithms and our new reliability measure to design Rhapsody, the first ABAC mining algorithm with correctness guarantees.

2015

Analyzing First-Order Role-Based Access Control
IEEE Computer Security Foundations Symposium (CSF 2015)
Carlos Cotrini, Thilo Weghorn, Manuel Clavel, David Basin
We propose FORBAC, an extension of Role-Based Access Control (RBAC) based on first-order logic. FORBAC is expressive enough to formalize a wide range of access control policies. However, it is simple enough so that relevant policy analysis queries can be analyzed in NP, which we argue is a natural complexity class for this problem. To analyze queries efficiently, we reduce them to the problem of satisfiability modulo appropriate theories, and use off-the-shelf SMT solvers.

2014

Deciding Safety and Liveness in TPTL
Information Processing Letters 2014
Carlos Cotrini, Felix Klaedtke, Eugen Zalinescu, David Basin
We show that deciding whether a TPTL formula describes a safety property is EXPSPACE-complete. Moreover, deciding whether a TPTL formula describes a liveness property is in 2-EXPSPACE. Our algorithms for deciding these problems extend those presented by Sistla to decide the corresponding problems for LTL.
Primal Infon Logic with Conjunctions as Sets
Theoretical Computer Science 2014
Carlos Cotrini, Yuri Gurevich, Ori Lahav, Artem Melentyev
Primal infon logic was proposed by Gurevich and Neeman as an efficient yet expressive logic for policy and trust management. It is a propositional multimodal subintuitionistic logic decidable in linear time. However in that logic the principle of the replacement of equivalents fails. We introduce a version of propositional primal logic that treats conjunctions as sets, and show that the derivation problem for this logic can be decided in linear expected time and quadratic worst-case time.

2013

Basic Primal Infon Logic
Journal of Logic and Computation 2013
Carlos Cotrini, Yuri Gurevich
Primal infon logic (PIL) was introduced in 2009 in the framework of policy and trust management. In the meantime, some generalizations appeared, and there have been some changes in the syntax of the basic PIL. This article is on the basic PIL, and one of our purposes is to 'institutionalize' the changes. We prove a small-model theorem for the propositional fragment of basic primal infon logic (PPIL), give a simple proof of the PPIL locality theorem and present a linear-time decision algorithm for PPIL in a form convenient for generalizations.
Transitive Primal Infon Logic
The Review of Symbolic Logic 2013
Carlos Cotrini, Yuri Gurevich
Primal infon logic was introduced in 2009 in connection with access control. In addition to traditional logic constructs, it contains unary connectives p said indispensable in the intended access control applications. Propositional primal infon logic is decidable in linear time, yet suffices for many common access control scenarios. The most obvious limitation on its expressivity is the failure of the transitivity law for implication. Here we introduce and investigate equiexpressive "transitive" extensions TPIL and TPIL* of propositional primal infon logic as well as their quote-free fragments TPIL0 and TPIL0* respectively.

Research Interests

Privacy-Preserving Machine Learning
Developing algorithms that enable machine learning while protecting individual privacy through differential privacy, federated learning, and secure multi-party computation.
Security and Privacy Analysis
Analyzing the security and privacy properties of systems, particularly web technologies and machine learning models, with a focus on GDPR compliance and privacy regulation enforcement.
Robust Machine Learning
Creating machine learning systems that maintain performance under distribution shifts, domain changes, and adversarial conditions through causal inference and invariant representations.