Data more important than method – 10 years of RDNL and Dutch Data Prize
“If you want to discover the right things and give the rights answers, the data has to be good”
To celebrate 10 years of RDNL, we interviewed three winners of the Dutch Data Prize, which has been part of RDNL since 2013. What did the prize entail for them, why do they find FAIR data important and why should researchers nominate themselves for this prize? In the second edition of this three part series, we speak with Joaquin Vanschoren. He is convinced that research data is becoming more important than the method.
After his PhD, Joaquin Vanschoren picked up where he left off; making machine learning data available, so that people could learn from it and evaluate it. In 2016 he was awarded the ‘Nederlandse Dataprijs’ (nowadays Dutch Data Prize) in the category Natural and Engineering Sciences (NES). “The Dutch Data Prize has placed a spotlight on our research at a crucial moment,” says Vanschoren, professor Machine Learning at Eindhoven University of Technology and founder of OpenML. He started from the frustration of being unable to reproduce machine learning results and get a total overview. The goal was to simplify machine learning and understand which rules one could apply to optimally design and reuse machine learning models.
Open, interoperable and reproducible
In the beginning it was just a website with a database based on the technology available at the time. Now, more than 10 years later, OpenML contains millions of model evaluations and a quarter of a million users. The platform is completely open source and is being further developed by a group of volunteers, usually consisting of twenty core developers. It is being extended to other code languages, such as Python, and will be updated with the latest useful information. “We are always looking at the latest developments and how we can integrate them. All, while maintaining our core values. Everything must be open, interoperable and reproducible so that the community can use it.”
According to Vanschoren, the data itself is becoming more important than the way it is modeled. “If you want to discover the right things and give the rights answers, the data has to be good” That’s why OpenML requires data to meet certain conditions before it can be deposited. “Because we know that not everyone is aware of making data FAIR, we have developed a tool that makes it easy to extract all the meta-data needed to improve FAIRness.” Moreover, due to an automatic analysis, ‘problems’ in the dataset quickly emerge. “It is more work, but it keeps the quality of the data high and thus also the possibility of reusing models.”
Vanschoren agrees, “Open source can be demanding work”. Winning the Dutch Data Prize was a boost for him and his team. “It shows appreciation for all the work that has been put into it.” According to him, it also helps with writing new projects and to attract funding. He therefore encourages everyone to nominate themselves for the Dutch Data Prize.
Read part one of the three part series here: An interview with Maarten Marx. One of the first winners of the Dutch Data Prize.
The Dutch Data Prize is part of Research Data Netherlands (RDNL), a consortium of 4TU.ResearchData, DANS, Health-RI and SURF. The next round of the Dutch Data Prize will be held in 2024.