This content was created by the Data Sharing Coalition, one of the founding partners of the CoE-DSC.
The Data Sharing Coalition supports organisations with realising use cases at scale to exploit value potential from data sharing and helps organisations to create required trust mechanisms to share data trusted and secure. In our blog section ‘Q&A with’, you learn more about our participants and their thoughts, vision and ideas about data sharing. Robin Schut, co-founder at BranchKey, shares his thoughts.
1. Could you introduce your organisation?
BranchKey is a young start-up and Platform-As-A-Service company focused on providing Federated Machine Learning (FedML) technology. FedML is a technology that connects a distributed network of machine learning models and finds the optimum model using information from each individual model. On our platform, organisations can collaborate with each other and together train Artificial Intelligence models in a secure environment. These organisations use federated learning to gain insights from external data sources and share those without exposing data, especially sensitive data. Data that is used to train an AI model remains in its location, but the insights of those models are shared with other organisations that they choose to collaborate with.
It is our mission to facilitate the collaboration on AI development by allowing our users to share intelligence and knowledge through AI models on our platform. We do that by standardising model architectures and data requirements and providing the infrastructure for information to be shared between parties. We hope that by collectively training AI models together, many different viewpoints will be taken into consideration. We believe this is the responsible way to develop AI.
It is our vision that everyone in the world is responsible for the development of AI. Artificial Intelligence is a powerful and promising technology that can have an impact in ways that were previously hard to imagine, as we’ve seen in 2016 when Lee Sedol was beaten by Google DeepMind’s Alpha Zero. Therefore, it’s in society’s best interest to democratise the development of AI applications to make sure the development happens in a controlled and responsible way.
2. To what extent is your organisation involved in data sharing (within and across sectors)?
BranchKey is an infrastructure provider of FedML technology. Organisations can use our technology to learn from data sources within and across sectors by collaborating with other organisations.
Companies are actively looking for external data sources to train their AI model, but they are often held back by privacy regulations such as the GDPR and governance, both internally and externally. Federated machine learning can solve these issues. Think of an AI model as a network of nodes and connections, often referred to as model parameters. To train an AI model, you need to show the model data and a set of instructions to learn from that data. During the training phase, these connections strengthen depending on the data. It’s exactly these connections that you’re interested in, not the data itself. Our platform does not share copies of data, we share model parameters. If the goal of your data sharing initiative is to jointly train an AI model, then Federated Machine Learning is a great fit.
Outside of the Data Sharing Coalition, we are currently involved in a use case in the energy sector in which an AI model is used to predict the future energy demand of several buildings. By acting on these predictions – for example, by sending instructions to the central heating system to turn on/off at advantageous hours – a lot of energy consumption can be saved. All the buildings have full data autonomy, which means the data can’t leave the building. This way, the building owner stays in control over his/her own data and prevents another company from accessing it without permission. The added value of federated learning is that the AI models can learn from patterns that emerged at other buildings. Hence, the predictive model can generate better predictions, which in turn leads to more energy savings.
We see a future in which data is made available to train AI models in a safe and responsible way.
3. Why is or should sharing data be important for your industry or domain?
There is a famous saying in the field of Artificial Intelligence which states that a model is only as good as the data it sees (“Garbage in, garbage out”). Training a model on just one data set can lead to a wide range of problems. One problem is that biases can occur in models because they are trained on a data set that contains a predominant feature. An example of the negative effects of this bias, is categorising an individual based on ethnicity or gender. To overcome this problem, AI models need to be enriched by a lot of different data sources. The more data an AI model sees, the less bias the model develops which consequently leads to the least chance of encountering negative effects.
4. What are the most promising data sharing developments and trends you see in your sector?
Many initiatives are taking place to govern how AI models are trained (for example to minimise or eliminate negative effects). These initiatives can range from standardising data formats to open-sourcing data. For example, the scale-up Huggingface.co publishes a lot of open-source data and benchmarks AI models. Another promising development when it comes to the concept of bringing data to the algorithm is the counterpart of traditional training methods. In order to do this, you physically need an export of a data source on the server where the AI model resides as well. However, new developments are emerging where algorithms are brought to the data source and complex computation tasks are pushed out to the edge, rather than everything being centralised in the cloud. Our technology can be categorised under the latter.
5. How do you see the future of data sharing, and what steps are you currently taking in that direction?
In our field, we hope to see that federated learning and other privacy preserving technologies will be widely adopted across many sectors. We see a future in which data is made available to train AI models in a safe and responsible way. BranchKey is working hard to provide scalable infrastructure for companies so they can work together on AI models. However, we are only one piece of a very large puzzle. We would like to see many other companies engage in responsible data sharing and hope that other start-ups will also come up with new innovative technologies in this domain.
6. Why are you participating in the Data Sharing Coalition?
We are participating in the Data Sharing Coalition to learn from many different sectors what their reasons are to share data. Furthermore, we are participating to learn from use cases and learn about frameworks that are needed to engage in a data sharing endeavour. By listening to participants at Community Meetings and talking about experiences from other participants, we can improve our own services and platform.