The nuts and bolts of Differential Privacy (Part 1/2)

Does the term Differential Privacy ring a bell? If it doesn’t then you’re in for a treat! The first part of this article provides a quick introduction to the notion of Differential Privacy, a new robust mathematical approach to formulate the privacy guarantee of data related tasks. We will be covering some of its use cases, untangle its mechanisms and key properties and see how it works in practice.

Read More
Labelia Labs
Fairness in Machine Learning

This article attempts to clear up the complex and vast subject of fairness in machine learning. Without being exhaustive, it proposes a certain number of definitions and very useful tools that every Data Scientist must appropriate in order to address this topic.

Read More
Labelia Labs
Why and how can the contributivity of a partner in a collaborative machine learning project be assessed?

So as to meet the privacy requirements that certain domains demand for their data, one solution is to move towards distributed, collaborative and multi-actor machine learning. This implies the development of a notion of contributivity, to quantify the participation of a partner to the final model. The definition of such a notion is far from being immediate. Being able to easily implement, experiment and compare different approaches is therefore mandatory. It requires a simulation tool, which we have undertaken to create, in the form of an open source Python library. Its development is ongoing, in the context of a workgroup bringing together several partners.

Read More
Labelia Labs
What's up, doc?

The general documentation for the Substra framework has been published for a while now without taking the time to have a look at it, so let's take a retrospective look at it together!

As you probably know, open source software isn't all about publicly accessible code. It is also and above all a project, a place where discussions, tests and developments come together. It is therefore essential to be able to settle down comfortably!

Read More
Growing applications of AI techniques vs. growing concerns: how to ease the tension?

The objective of this article is to present the participative approach on the theme "responsible and trustworthy data science" that we initiated in the summer of 2019 and that we have been leading since then. I will follow the thread of the presentation I made at the "Big data & ML" meeting on September 29, 2020. I hope that this blog format will allow as many people as possible to discover this initiative, perhaps to react to it, or even to come and contribute to it. All the feedbacks are welcome, they come to feed the reflection and the work and we need it!

Read More
Eric Boniface
[Part 2/2] Using Distributed Learning for Deepfake Detection

In the first part of this article, we introduced a secure, traceable and distributed ML approach for a deepfake detection benchmark using the Substra framework. In this second part, we will present the technical details of the Substra framework and see the whole process of implementing an example on Substra, in order to allow you to submit your own algorithms or add your dataset to the Substra network.

Read More
Nathanael Cretin
[Part 1/2] Using Distributed Learning for Deepfake Detection

In this article, we present several facial manipulation techniques known as “deepfakes” and show why it’s important to improve the research on deepfake detection. We present the state of art of the deepfake detection datasets and algorithms, and introduce a secure, traceable and distributed machine learning approach for a deepfake detection benchmark using the Substra framework.

Read More
Nathanael Cretin
[Guest post] Story of the 1st Federated Learning Model at Owkin

This guest post was written and originally published on Owkin website. Owkin, a fast-growing health data AI startup, is a core partner to Substra Foundation and dedicates a full tech team to the development of the Substra Framework first version. Several founding members of Substra Foundation are working at Owkin. Owkin and Substrra Foundation are both members of Healthchain consortium.

Read More
Mathieuy Galtier
MELLODDY - A unique project, a unique organization.

The MELLODDY consortium brings together 17 partner organizations of different types, working towards a common goal, in multiple countries with different cultures. The project is fully remote, with various businesses and technicals skills brought in by the partners. How does one develop transversality to a project in this context? How to create a common way of working in such an innovative and new collaboration endeavor? That is what Substra Foundation is trying to contribute to...

Read More
Labelia Labs
How to enhance privacy in Data Science projects?

In this blog article we present the most important Privacy-Enhancing Techniques (PETs) that are currently being developed and used by various tech actors. We briefly explain their principles and discuss their potential complementarities with Substra Framework. The aim of this article is to present potential Substra Framework development and possible integration with other technologies.

Read More
Romain BEY
Securing AI: should data be centralized or decentralized?

The massive collection of personal data represents a new risk to privacy, and citizens-consumers are asking their representatives and their companies for higher security standards. Whereas personal data has been historically protected through anonymization, this technique often appears inefficient when artificial intelligence models are trained on personal data. New securing frameworks must be developed that can rely either on data centralization in a single vault or decentralization in multiple data warehouses.

Read More
Romain BEY
How does blockchain enable healthy collaboration between hospitals and private organization on health data?

Hospitals have a huge amount of data

Every year, millions of patients are treated in French hospitals. The data of these patients are naturally stored in each of the hospitals' Information Systems (IS) and constitute an essential  material not only for patient care but also for clinical research.

Hospitals are full of data (patient data and associated diagnoses) in many departments: mammograms and their diagnosis for breast cancer, genomic data and associated diseases, etc.

How can this data be made available for use in the world?

Read More
Romain Goussault
AI and Sensitive Data: a Trust Problem

Today, anywhere in the world, when a researcher or a data scientist wants to train an algorithm to do machine learning and create a prediction model, s/he must usually begin by grouping or gaining access to an already constituted dataset. S/he observes these data, consults some descriptive statistics, and manipulates them, etc. At this point, a problem of trust arises; from the moment one accesses the data the only protections against an illegitimate use of it are the ethical stances of the data scientist and/or the law, upheld through contracts or data usage agreements. Ethics and the law, that is-- trust, which is at the heart of collaborative work. But is trust always enough?

Read More