Data spaces and Privacy Enhancing Technologies (PETs) have a common goal: making insights from data accessible in a confidential manner. But despite this overlap, the development of data spaces and PETs are driven by two different communities. According to Freek Bomhof and Harrie Bastiaansen, both consultants at TNO and affiliated with the CoE-DSC, this must change. Both Freek and Harrie were involved in the development of a joint Big Data Value Association (BDVA) and CoE-DSC whitepaper ‘Leveraging the benefits of combining data spaces and privacy enhancing technologies’. We spoke with them about why applying PETs within data spaces, confidentially exchanging insights from (privacy sensitive) data becomes more scalable.
Europe is committed to Common European Data Spaces
Data spaces are decentralised infrastructures that allow organisations to make their data accessible to others based on specific agreements. For example, agreements about the technical, legal and economic conditions under which data become accessible. In terms of data sharing, data spaces help to improve scalability, make data discoverable and establish trust between the participants. “The development of data spaces is strongly encouraged by Europe,” says Harrie. “The European Union released its data strategy in 2020, in which it states that it wanted to initiate the development of so-called Common European Data Spaces. Europe has amended legislation, sets up reference architectures that guide what Common European Data Spaces should look like, but more importantly: it is driving the introduction of data spaces by being actively involved in setting up those Common European Data Spaces. The development of data spaces has therefore accelerated. We think that there are various important research areas, for example PETs, where complementarity can be exploited by uniting it with the European data strategy and the development of data spaces.”
From sharing data to making insights accessible
Both data spaces and PETs emerged in response to the need to share data between organisations in a trusted way. But the starting points are very different. “PETs are used to derive the outcome of a data analysis without having to view the data itself,” Freek explains. “TNO is currently involved in the HERACLES project, which looks at data related to lung cancer. One of the analyses we do with PETs, is about discovering the factors that determine whether or not people will get lung cancer. Consider someone’s income, occupation or place of residence. This is privacy sensitive information that should not be shared with just anyone. With a PET, one can conduct a correlation analysis in which only the outcome of that analysis is shared without it being traceable to an individual. This is very relevant but not scalable, since a consultant always has to set up a new tailormade combination of PETs for each new analysis in order to perform it. Another scalability challenge arises when you want to combine PETs of different vendors. To do this, some form of standardisation is needed. Applying PETs within data spaces can help, because involved parties have to adhere to agreements and standards that are already used in data spaces.”
However, according to Freek, insufficient account has been taken (so far) of how to embed PETs within the European data strategy and in the development of Common European Data spaces. “In a data space, careful consideration has been given to how you deal with sharing data within a group of people who have made agreements with each other. However, the concept in which one goes a step further and just wants to make the insights from data accessible without sharing the data itself, has not yet been sufficiently defined. Simply put, there are no ‘plugs and switches under the hood’ of data spaces to arrange this: a lot of extra encryption and communication is needed to set up the orchestration processes for a PET within a data space in such a way that insights are shared instead of the data itself.”
“Many PETs that are already available are not yet suitable for use in the context of a data space,” confirms Harrie. “In a data space, almost everything needs to be ‘as-a-service’. In an ideal world, ‘PETs as a service’ would become possible, where organisations affiliated with a data space would not have to develop a new PET configuration from scratch for every specific application or research question. To illustrate: ultimately, you want to roll out not ten, but hundreds of interconnected data spaces with potentially millions of connections and participants, each ready to be used for making data accessible to PETs as well. However, that also means that all functionality of PETs and orchestration processes of data spaces must be well connected.” The time has come for the PET community to join the data spaces community, Freek believes. “Due to the success of the relevant but isolated applications, mechanisms are being sought to make PETs more widely applicable. That way of thinking is starting to look quite similar to the thinking about data spaces.”
Privacy patterns will help to embed PETs and enable scalability
At the moment, PETs are still tailor-made solutions and it is necessary to investigate how they can be used in a scalable manner within data spaces. Freek: “We think that the concept of privacy patterns can be a starting point for a methodical way of embedding PETs. You could use privacy patterns within a data space to analyse what a request for information looks like and which underlying data sources are needed. Based on these privacy patterns, PET suppliers that are connected to a data space can semi-automatically determine which functionalities are needed to gain insights from data and as a result of that, determine which PET solution fits the request. Eclipse, an open source software foundation, has set up a working group to focus on how privacy patterns could be used in this context.”
With the white paper, Harrie, Freek and the other authors initially want to make the data spaces and PETs communities aware that it is important to come together and investigate how PETs can be combined within data spaces. Harrie: “Data spaces and PETs are two communities that have a lot to gain by working together, but they must learn to speak each other’s language.” “It starts with the word ‘trust’,” Freek says laughing. “Within the data spaces community, this is seen as something that should be maximised, while the PETs community sees trust as something that you should not have to rely on. Fortunately, both communities do agree on the end goal: sharing more insights from data in a confidential manner. That seems like a good basis for collaboration.”