FAIR data and advanced digitalisation: a potential to strengthen open science principles and push the development of the European Open Science Cloud.
Open Science (OS) remains a priority of the European Commission. Advanced digitalisation of research was mainly driven by Covid-19 due to the extended lockdown that caused the development of remote access protocols to Research Infrastructures (RIs) and other laboratories to reduce the impact of researchers’ limited mobility. At the same time, it is also linked to the nature of research itself since remote access and collaboration are of general value also in usual research operations as they are an alternative to the physical displacement of users to RIs, laboratories and observatories, and can thus further speed up these developments.
Artificial Intelligence (AI) and Machine Learning (ML) are key components of advanced digitalisation in research and have the potential to benefit the whole research community. However, AI needs Findable, Accessible, Interoperable, and Reusable (FAIR) data and the AI algorithms need to be developed by the research community, as only close collaboration with scientific expertise can drive the development of transparent, understandable, and robust AI tools.
In this context, on 17 June 2024, two independent European Open Science Cloud (ESOC) Steering Board expert group reports were published: one on advanced digitalisation of research and one on FAIR data and productivity. It is important to clarify that FAIR datasets can be used by researchers themselves and indirectly via machine operability within the EOSC services.
The opinion paper on advanced digitalisation of research highlights that AI as a service for the development and operation of EOSC will be of importance for assisting FAIRification of data and technical Quality Assessment of FAIR research objects (data, software, workloads). The experts also assessed the current state of advanced research digitalisation and identified gaps that need to be addressed to obtain an EOSC that is fully operable. According to the opinion paper, research reproducibility and data usability will be enhanced in case advanced digitalisation becomes the common practice of the research community. This would strengthen the open science principles and policies and create a critical mass of quality assessed FAIR Data and research objectives, enabling reliable and secure AI, ML, and Virtual Research Environments.
In the second opinion paper on FAIR Data and productivity, the EOSC Steering Board provides principal recommendations based on consultation with the research community. It emphasises that internationally operated Research Infrastructures (RIs) have already provided significant results in data production, curation and sharing of rules. These can inspire the full scale of research and innovation activities and, if successfully applied, could represent a competitive advantage for Europe. Nonetheless, maintaining a coherent approach when combining information and methods from different sources and areas is challenging. In this sense, to ensure the productivity of FAIR digital research objects, including data, software, technical solutions, workflows, and algorithms, there is a need to implement transparent common standards adapted to different domains. Another important feature is the quality of the data, for which a quality assessment is a key element of FAIR data, a process that can also be reinforced by advanced digital tools, such as AI and ML. Finally, training researchers and data professionals on advanced data management and on curation and good practices for AI and FAIR data is key.