Synthetic data: a solution when developing AI applications?

This use case was initiated in collaboration with the Data Sharing working group of the Dutch AI Coalition (NL AIC), one of the founding partners of the CoE-DSC.

The availability of data and access to it is crucial to the development of AI applications. For many organisations (start-ups and scale-ups in particular), getting data available quickly is a massive stumbling block. There are challenges in getting relevant data made available and the readiness of others to do so, as well legislation and regulations (e.g. privacy), which keep getting stricter. Without data, there can be no data-driven innovation using artificial intelligence (AI) and so solutions are badly needed.

One possible approach is to use synthetic data. This up-and-coming solution is also underlined by e.g. Gartner*, who predict that 60% of the data used for developing AI and analysis applications will be generated synthetically by 2024.

Use of artificial intelligence

What exactly is AI-generated synthetic data? Whereas original data is collected through interactions with individuals, synthetic data is created by a computer algorithm that generates completely new, artificial data points. The new aspect is using AI in the data synthesis process for modelling the synthetic data that is generated in such a way that the characteristics, relationships and statistical patterns of the original dataset are simulated. AI-generated synthetic data is a new solution that provides large quantities of representative data simply and quickly. Syntho, an expert in AI-generated synthetic data, wants to us this approach for building the foundations for data-driven innovation (e.g. using AI) and they have recently won the Philips Innovation Award with their proposals.

What challenge does it solve?

The outputs from this use case will answer some of the frequently asked questions about using synthetic data. What is the value of synthetic data? When is it a good solution and when is it less effective? What are its limitations? And what are the pros and cons of synthetic data compared to other privacy-enhancing technologies (PETs)?

Syntho and SAS are going to work together to compare AI-generated synthetic data against original datasets and assess them in terms of data quality, legal validity and usability. This will create a picture of the added value of synthetic data, show where synthetic data is less useful and what follow-up steps organisations and the NL AIC should and could take to encourage the development and application of AI. The use of synthetic data will also be shown within a broader perspective by comparing it against the privacy-enhancing technologies (PETs) that already exist.

Sharing knowledge with NL AIC affiliates is key

Generating the actual synthetic data makes it possible to compare it against the original data and then assess its data quality, legal validity and usability. The following outcomes will be shared and made available to the NL AIC’s participants, aiming to promote knowledge sharing and answer questions about synthetic data:

  • The quality report.
  • The final presentation.
  • A training session on privacy-enhancing technologies (PETs) that will also discuss other PETs such as encryption, pseudonymisation, anonymisation, etc.
  • A synthetic version of a publicly available dataset.

Parties involved

In this use case, Syntho, SAS and the NL AIC are working together to achieve the intended results. Syntho is an expert in AI-generated synthetic data and SAS is the market leader in analytics, providing software for exploring, analysing and visualising data.

More information

If you are interested, go to the Syntho website for more information about synthetic data.
Contact: Wim Kees Janssen,


More cases

Benchmarking for industry associations

Benchmarking for industry associations

Together with our participant SBR Nexus, data of industry association members is shared with industry associations to create valuable insights about the whole sector.