Synthetic data: a solution when developing AI applications?

This use case was initiated in collaboration with the Data Sharing working group of the Dutch AI Coalition (NL AIC), one of the founding partners of the CoE-DSC.

The availability of data and access to it is crucial to the development of AI applications. For many organisations (start-ups and scale-ups in particular), getting data available quickly is a massive stumbling block. There are challenges in getting relevant data made available and the readiness of others to do so, as well legislation and regulations (e.g. privacy), which keep getting stricter. Without data, there can be no data-driven innovation using artificial intelligence (AI) and so solutions are badly needed.

One possible approach is to use synthetic data. This up-and-coming solution is also underlined by e.g. Gartner*, who predict that 60% of the data used for developing AI and analysis applications will be generated synthetically by 2024.

Use of artificial intelligence

What exactly is AI-generated synthetic data? Whereas original data is collected through interactions with individuals, synthetic data is created by a computer algorithm that generates completely new, artificial data points. The new aspect is using AI in the data synthesis process for modelling the synthetic data that is generated in such a way that the characteristics, relationships and statistical patterns of the original dataset are simulated. AI-generated synthetic data is a new solution that provides large quantities of representative data simply and quickly. Syntho, an expert in AI-generated synthetic data, wants to us this approach for building the foundations for data-driven innovation (e.g. using AI) and they have recently won the Philips Innovation Award with their proposals.

What challenge does it solve?

The outputs from this use case will answer some of the frequently asked questions about using synthetic data. What is the value of synthetic data? When is it a good solution and when is it less effective? What are its limitations? And what are the pros and cons of synthetic data compared to other privacy-enhancing technologies (PETs)?

Syntho and SAS are going to work together to compare AI-generated synthetic data against original datasets and assess them in terms of data quality, legal validity and usability. This will create a picture of the added value of synthetic data, show where synthetic data is less useful and what follow-up steps organisations and the NL AIC should and could take to encourage the development and application of AI. The use of synthetic data will also be shown within a broader perspective by comparing it against the privacy-enhancing technologies (PETs) that already exist.

Sharing knowledge with NL AIC affiliates is key

Generating the actual synthetic data makes it possible to compare it against the original data and then assess its data quality, legal validity and usability. The following outcomes will be shared and made available to the NL AIC’s participants, aiming to promote knowledge sharing and answer questions about synthetic data:

The quality report.
The final presentation.
A training session on privacy-enhancing technologies (PETs) that will also discuss other PETs such as encryption, pseudonymisation, anonymisation, etc.
A synthetic version of a publicly available dataset.

Parties involved

In this use case, Syntho, SAS and the NL AIC are working together to achieve the intended results. Syntho is an expert in AI-generated synthetic data and SAS is the market leader in analytics, providing software for exploring, analysing and visualising data.

More information

If you are interested, go to the Syntho website for more information about synthetic data.
Contact: Wim Kees Janssen, kees@syntho.ai

More cases

Cross Sectoral data sharing for E-mobility (CCAM – Connected, Cooperative AND Automated Mobility)

This use case was initiated in collaboration with the Data Sharing Coalition, one of the founding partners of the CoE-DSC. Introduction to the use case context In recent years, the

Exploration of Personal Data Spaces for commons

The Amsterdam Commons, Post Platforms Foundation, Schluss, and the CoE-DSC are collaboratively exploring personal data spaces to establish data sovereignty for commons. Commons refer to a collective of people who share and manage resources according to mutually agreed-upon rules, characterized by self-organization, local participation, non-hierarchical structures, and inclusivity. Examples in Amsterdam include shared housing, community-managed parks, and energy and mobility collectives.

Exploration of EU Digital Identity Wallets for Legal Entities with Company Passport and iSHARE

Together with the Data Sharing Coalition, we worked together with eIDAS experts, the Company Passport initiative and the iSHARE foundation to describe where the wallet could add value to existing trust infrastructure, what the current barriers are and provide next steps for CoE-DSC, iSHARE and Company Passport.

Advancing data collaboration for monitoring the Dutch elderly care through MPC technology

Our participant Linksight collaborates with DSW, the municipality of Delft and elderly care organisation Pieter van Foreest and uses Multi-party Computation to generate insights without sharing the data itself, which helps to monitor and improve elderly care.