This content was created by the Data Sharing Coalition, one of the founding partners of the CoE-DSC.
The Data Sharing Coalition supports organisations with realising use cases at scale to exploit value potential from data sharing and helps organisations to create required trust mechanisms to share data trusted and secure. In our blog section ‘Q&A with’, you can learn more about our participants and their thoughts, visions and ideas about data sharing. In this Q&A, Carlos Utrilla Guerrero, Data Scientist at Maastricht University Institute of Data Science, shares their thoughts.
1. Could you introduce your organisation?
The Maastricht University Institute of Data Science (IDS) is a research institute that focuses on building and implementing standards for responsible data science. We bring together social, legal and ethical aspects of data science and artificial intelligence across disciplines, institutions and sectors. IDS specialises in FAIR (Findable, Accessible, Interoperable, and Reusable) data, formalised knowledge representation, social network analysis, privacy-preserving machine learning and decentralised data-driven applications. We aim to provide insight into how responsible exchange of corporate data can improve people’s lives and business outcomes. And we believe FAIR can generate a new form of collaboration beyond the public-private partnership model, in which participants from different sectors – in particular companies – exchange their data to create value, for example public value or business outcomes.
The Data Sharing Coalition offers an excellent platform to help industry professionals that are dealing with data sharing challenges.
2. To what extent is your organisation involved in data sharing (within and across sectors)?
Data sharing is a necessity to accelerate digital business, boost business outcomes and improve public services. Yet, there are not many examples of data sharing that adequately locate trusted data sources and evaluate data sharing strategies. This reflects the need for a responsible infrastructure, and most importantly, a cultural shift. IDS believes that FAIR principles, combined with responsible data science, could change the culture of data management & data sharing and thereby maximise the value of data. It is our mission to build innovative resources and create international standards collaboratively with our community to share data and other digital objects (e.g. code and software) responsibly. For instance, at Maastricht University we have been participating in Community Data Driven Insights (CDDI). This inter-departmental initiative aims to turn all digital objects within Maastricht University into FAIR digital objects. We support researchers to embed FAIR Data Principles into their daily activities and provide technical, legal and ethical support.
IDS is working on a few ongoing showcases of data sharing and FAIR data management: FAIR data exchange of historical games between digital museums, build Knowledge Graph and technical infrastructure for EU corporate mobility data, construct an open data knowledge reservoir for agricultural practices, and biodiversity Knowledge Graph. These projects aim to show the benefits of implementing semantic technology such as Knowledge Graphs, with a focus on data sharing. We do this by simplifying the process of data capture, using sematic models like SIO ontology to establish interoperability across data sets from Open Government Data and other publicly available datasets.
IDS also works with a wide variety of collaborators and projects at national (ODISSEI, CLARIAH, ICAI), European (EOSC Life, COST Action), and international level (Biomedical Data Translator). Our main focus here is the use of data science techniques such as semantic web, privacy-preserving machine learning, and implementation of the FAIR principles to address different sharing data challenges within the community. One of the challenges we faced when sharing and reusing data was legal issues concerning sensitive data, for instance in the health and education domain. We couldn’t exchange and merge students’ data about socio-emotional variables. This led to the development of privacy-preserving techniques, which has become a prominent, general approach to solve the legal issues of data sharing. IDS is investigating this legal aspect by applying privacy-preserving techniques combined with synthetic data generation.
3. Why is or should sharing data be important for your industry or domain?
Science is opening up more, not only in terms of data but also in terms of digital objects such as computer code, software and workflows. Sharing all kinds of digital objects is important because it yields benefits: it yields new collaborations, increases confidence in findings and generates goodwill among communities. But most importantly: reusing these digital objects in the scientific community fosters better science. In the context of FAIR, implementing these principles helps researchers to guarantee not only discoverability and visibility of researchers’ outputs, but also – and more fundamentally – improve the credibility and veracity of their knowledge claims.
4. What are the most promising data sharing developments and trends you see in your sector?
We believe that, to solve today’s challenges, we need both new solutions for data sharing and new ways to put them in practice. We have seen endless normative discussions on how data should be shared, but little analysis exists of the actual practice of data sharing. Initiatives that foster the responsible use of data will be the key towards that end. And we are glad to see ongoing initiatives of research communities to change the culture of data management and data sharing.
The purpose of the above mentioned initiatives is to focus on wider societal questions of data governance and to assist researchers who aim to make their data FAIR. A community explores the challenges and opportunities for researchers, institutions and funders. They examine strategies, resources and promising practices to develop and implement data management and data sharing plans, as well as provide insight into current practices about data sharing over the course of the data lifecycle. This will help to establish effective data management and data sharing practices. Furthermore, data that is in line with FAIR principles and follows community standards, fosters a transition from human readable data to machine readable data, which in turn will require considerable reflection and adaptation from each particular research community, taking into consideration diverse needs (e.g. human versus machine specifications).
5. How do you see the future of data sharing, and what steps are you currently taking in that direction?
We envision a short-term need for FAIR beyond academia. The FAIR principles must be considered by data science teams in an organisation, especially as private and public bodies begin to invest heavily in extracting insights from data for decision-making. Furthermore, in the private sector, companies currently see great potential in sharing and analysing data to better understand individuals’ needs. When sharing data, businesses face organisational and legal challenges, for example the need to protect online privacy and consent of individuals.Most individuals understand that sharing data with companies will give the more personalised experiences, but they are the ones who decide what to share in exchange for value.
Different actors in research have dedicated their efforts to investigating applications for a more responsible data sharing ecosystem model. A sustainable data sharing solution must be legally compliant, technically feasible, socially responsible and financially and commercially viable. We believe that decentralised data-driven applications like Solid will enable individuals to control who gets access to data and also turn off that access with a switch. We need rules and technology to solve this problem. IDS aims to help realise a unique ecosystem in which organisations can share data responsibly and safely and with confidence, while retaining control over data sharing.
6. Why are you participating in the Data Sharing Coalition?
Sharing data can save lives, costs, and time. IDS continuously explores opportunities within the Data Sharing Coalition to upscale FAIR data sharing: how can we combine and analyse data from cities, other governments or businesses and across sectors in FAIR fashion? Participants of the Data Sharing Coalition realise use cases to solve societally interesting problems such as supporting humanitarian services, contributing to more sustainable energy consumption or ensuring biodiversity. One of the use cases that took most of our attention was the data sharing use case to combat human trafficking. We had fruitful conversations about possible collaboration, applying FAIR and semantic technologies, and that is why we joined the Data Sharing Coalition. IDS is interdisciplinary and instead of working within disciplinary boundaries, we ask: “What is the problem we are trying to solve and how can we best solve that problem?” FAIR principles, semantic technologies, privacy-preserving technology and decentralised data-driven applications are our expertise and we are actively participating in the Data Sharing Coalition to understand needs and interests from participants, as well as stimulating new use cases to foster responsible data sharing across domains. Our goal is to apply data science techniques to enable more systematic, sustainable, and responsible data sharing.
We also see it as our mission to train participants of the Data Sharing Coalition on FAIR and responsible data science. The FAIR principles are anticipated to generate a €4 billion industry, for which there are few qualified employees today. IDS is one of the few institutes in Europe capable of training personnel in this booming profession. We aim to organise training for data experts in small and medium-sized enterprises (SMEs) to promote a wider adoption of best practices in responsible data science. The Data Sharing Coalition offers an excellent platform to help industry professionals that are dealing with data sharing challenges.