FAIR DATA (DAY) – IT TAKES A VILLAGE!

December 19, 2022

Authors: Marta Teperek, Kimberley Zwiers, Iulia Popescu, Samantha Willemsen, Marjan Grootveld, Maria Cruz, Ruben Kok

On the 29th of November 2022 Research Data Netherlands (RDNL) and the National Programme Open Science (NPOS) jointly organised the ‘FAIR Data Day’ to award the biannual Dutch Data Prizes and to celebrate the advancements of FAIR Data in the Netherlands.

It was a fantastic event that was bustling with people and filled with inspiring keynote speakers, community-led workshops, thought-provoking discussions, overjoyed prize winners and an overall vibrant atmosphere.

DUTCH DATA PRIZES

This year was the 7th anniversary of the biannual Dutch Data Prize competition. Peter Doorn, the former director of DANS and one of the initiators of the competition, gave us an inspiring overview of the importance of the competition in rewarding and celebrating FAIR datasets.

The 2022 edition of the Dutch Data prize competition received a record number of 51 nominations that were evenly spread between the three research domains: Social Sciences and Humanities, Life Sciences and Health, and Natural and Engineering Sciences. The jury, chaired by Caroline Visser, vice chair of the Board of the Dutch Research Council (NWO), had the difficult task of identifying one clear winner in each category. Caroline reflected that the jury was impressed by the overall quality of the submissions, the impact made by the datasets, and how Findable, Accessible, Interoperable and Re-usable (FAIR) they were. She called FAIR a shared responsibility of the entire science field, pointing to one of the key national ambitions in Open Science for the next decade.

The winner in each category received a large round of applause, a 3,500 EUR check and a trophy for their FAIR dataset.

In Life Sciences and Health, the prize went to Duong Vu and the ‘DNA barcodes for fungal identification’ dataset, who together with her team developed a database of over 24k fungal species.
In Social Sciences and Humanities, Coosje Veldkamp and the team behind the ‘YOUth cohort’ dataset were the winners with a study that followed the neurocognitive development of nearly 4,000 Dutch children from pregnancy until early adulthood.
In Natural and Engineering Sciences, Mitchell Van Zuijlen and the team behind the “Materials In Paintings (MIP)” dataset won with their annotated dataset that uses machine learning algorithms to classify fragmented pieces of 19,000 paintings.

Jan van der Heul, data curator at 4TU.ResearchData, collecting the prize on behalf of the team behind “Materials in Paintings (MIP)” dataset. Credit: Yan Wang.

Unfortunately, the winners in the Natural and Engineering Sciences domain could not be present to physically receive their prize in the afternoon. However, 4TU.ResearchData data curator Jan van der Heul was present to collect the award on their behalf. Interestingly, Jan van der Heul has not only curated their dataset, but also played a key role in the FAIRness of the winning dataset within the NES domain in 2020.

FORGET ABOUT EGOS: BEHIND EVERY ACHIEVEMENT, THERE IS A TEAM

Shalini Kurapati during her keynote session.

What stood out as the common thread during the day and across all the winners is that behind every impressive dataset, every outstanding achievement, every new tool or introduced policy, there was an entire team of dedicated people that had worked hard to achieve it. Shalini Kurapati, the CEO of Clearbox-AI who opened the day with a captivating keynote speech about synthetic data, started by saying: “it is not about me; it is about us”. We are a team! This thread continued throughout the day and was beautifully emphasised by Caroline Visser during the Dutch Data Prize award ceremony. It is not about the egos and glory. It is about collaboration, about knowledge creation and the impact on society. And to be the most impactful and to effectively tackle challenges, we need to work together.

The workshop led by Prof. Serkan Girgin and the team introduced the JypterFAIR tool that was developed thanks to the NWO Open Science Fund. JupyterFAIR streamlines the process of uploading and downloading datasets to/from repositories. What was remarkable is that at the start of the presentation Serkan mentioned and acknowledged all the colleagues who were involved in the project. He concluded with an inspiring call to action: he invited all the workshop attendees, their communities and beyond to become members of the project and co-design, or simply connect with others.

So we clearly need each other’s skills and expertise. We need researchers, the content experts, we need data stewards, research software engineers, community managers, project leaders, ethical experts: together we can do better and achieve more. The most beautiful illustration of the power of collaboration was when the winner of the Data Prize for the Social Sciences and Humanities domain (the YOUth cohort study) was announced: instead of just one winner, the entire team rushed to the stage!

SUPPORT STAFF ALSO NEED TO BE RECOGNISED AND REWARDED

However, to make teams work and fully benefit from the diverse backgrounds or skills that different contributors have to offer, everyone needs to feel valued and appreciated. And there is clearly work to be done in this area. Barry Fitzgerald, the chairperson of the day, asked a few questions to the audience via Mentimeter. One of the questions was whether data stewards felt appreciated at their organisations. The results clearly indicated that more can be done for data stewards to feel appreciated.

Mentimeter survey results

Sadly, what is true for the data stewards, also holds true for other professional support staff. At many research organisations, employees are split into two categories: ‘academic staff’ and ‘support staff’. However, as has been argued elsewhere, “well-functioning teams rely on the sharing of responsibilities and credit. For research to advance and progress, diverse personnel must be able to contribute their talent and skills without being too restricted by conventional hierarchies (…) Research institutions need to foster collaborative environments that empower problem solving and build mutual trust and respect for skills and expertise, regardless of job titles and perceived ranking and status.”

As beautifully stated by Gemma Derrick and Simon Hettrick: “Nobel prize winners didn’t get there on their own”. Therefore, we need to stop thinking in silos and start discussing how to recognise and reward professional support staff for their contributions to the research process. To begin with, professional support staff should be part of the discussions on changing recognition and rewards systems in academia.

WE DON’T WANT AN H-INDEX FOR DATA. CONTEXT MATTERS.

The majority or our participants seemed to agree that it is important to give credit where credit is due. However, how should we reward data re-use? Can we objectively measure data re-use? Or should data re-use affect researchers’ H-index/measurement of impact?

The reflection shared by Maria Cruz from NWO seemed to resonate well with the audience: “We definitely don’t want to have an H-index for data. We don’t want to end up with the same problems as those created by the journal publishing system.” Maria also warned that downloads, views and citations can be misleading. To determine the real impact of research data, qualitative measures are essential. We need to understand the context in which datasets are generated and shared in order to assess their value or how they contribute to specific (research) communities, overall knowledge creation or to society at large.

Our second keynote speaker, Nadia Bloemendaal, one of the Dutch Data Prize winners of 2020, gave compelling examples of how sharing of data on tropical cyclone risk made an impact on people’s livelihoods, risk prevention and improving the effectiveness of humanitarian response in affected areas. Such stories cannot be simply expressed as the number of downloads or views.

The audience was asked to reflect on questions such as: “What does the reuse of data mean to you?”

FAIR DATA DAY – IT TAKES A VILLAGE

Ultimately, we can conclude that making data FAIR is a collaborative team effort and, also hijacking the title of a webinar organised by NWO in November 2021), that it takes a village to organise the FAIR Data Day!

With an international audience of over 250 registered participants, more than 20 submitted workshop proposals and 50 nominations for the Dutch Data Prize, it was a truly community-led event. We are extremely grateful to all the attendees, keynote speakers, workshop providers, Dutch Data Prize jury members and nominees for a truly splendid event. This day would not have been possible without all these contributions. We also want to particularly thank the programme committee, organising committee and our colleagues (Jan van der Heul, Maribel Barrera and Maaike Smit), who all helped and worked very hard to turn this day into a success.

It really takes a village to make research data FAIR!

A fraction of the people involved in the organisation of the FAIR Data Day.