From an Idea to Publishing – a Journey of a Research Code

Guillaume Broggi

I can come back to what I did and reuse it for a similar application. It is something like my personal knowledge base.

Guillaume Broggi, Doctoral Assistant

Let’s start with the context. Can you introduce yourself and your work?

I am a PhD student at LPAC under the supervision of Prof. Véronique Michaud and Prof. Joël Cugnoni. We are a material science laboratory. My doctorate was set in the context of the EU project HyFiSyn. One of the requirements of the EU projects is to make all research outcomes available and as open as possible. The aim is for others to be able to access it, get familiar with our work, and reuse it. With open science in mind, we aimed to publish not only the paper but also support information. Especially the code that was used to generate and analyze the data presented in the paper. 

Is code publishing a common practice in your field?

Publishing code is not very common in materials science. This is something that I learned myself during my Masters’ studies; to systematically set up a git, to populate it with code, and to document with the aim not to make it perfect but useful. I can come back to what I did and reuse it for a similar application. It is something like my personal knowledge base. It is also useful to share with others. People can use the work, replicate the paper, and disseminate new knowledge further. Papers often claim results based on codes that they implemented specifically. However, codes are frequently seen as supplementary information and not described in detail. But there are many ways to implement them and so many different pitfalls one can encounter during implementation. As a result, as for any scientific experiment, it is impossible to reproduce the research without the methodology and you have to start from the beginning again when you want to do something similar. I think it is valuable from a scientific point of view to make the data and code available. I would like to have the same access to the code of others. 

Can you briefly describe the process?

We proposed a parametric study of a J-integral data-reduction method used to characterize material toughness. It is a well-known method, available for more than 50 years. It is quite simple when you look at the math but there are a lot of different ways of implementing it because it relies on displacement field derivatives. As a part of the paper, we made the code available on GitHub and associated it with a record on Zenodo. The goal was to be able to reference the code in the paper to increase visibility but also to have a long-term preservation of the code. It is very good for reproducibility because when we reference a code in the paper using DOI from Zenodo, we reference not the general GitHub repository but the specific release of the code used in the paper. At the same time, the DOI also references the new versions of the code, which is good to make the community aware of updates.

Were there any challenges?

If you are used to Git the process is quite straightforward. GitHub and Zenodo are integrated so you can easily push the correct version of the code. However, it lacks guidelines. I was not sure how to proceed exactly. Reserve the DOI on Zenodo first, or publish the paper first? What is the correct sequence of steps that one should follow? What license should I use? In the end, I created an empty repository on GitHub and linked it to Zenodo. This enabled me to reserve a DOI that I used in the paper to reference the code. I populated the GitHub and pushed a new release into Zenodo later.

For not computer science-savvy people, there is a lack of resources. I used guidelines from Berkeley. This was very helpful for me. You can find individual elements online, but the guide gave me the needed overview. In my experience, thinking about code or how to publish it is not natural when working in fields such as material science. It is not difficult, but it does not come naturally. There is also no clear understanding of the advantages of doing it this way.

Why was it important for you not only to publish the paper but also to make the code available? 

There are a lot of papers out there, that do not contain any support material. Basically, it is a piece of text. There is no way to reproduce it or re-use it. For the specific implementation that we did, a few papers were already published on the subject. We were never able to access the code. Hence, it is not useful. There is no way to replicate or reuse it without guessing what the authors used in the first place. Of course, it is possible to ask directly the authors, but this takes time and does not promote innovation and fast development. 

For me, it was important to make the code available. Even if my code is not perfect, it is accessible to everyone, including myself. It can be used and useful. Thanks to the Zenodo DOI that redirects to the up-to-date version, I can still come back to it later, to refactor properly the code or implement a new feature.

Secondly, the project was funded by public resources. Therefore the results should be made available to the public. Why keep it a secret? One little piece of code may seem irrelevant, but it may be very useful to someone. 

Thirdly, a rather hidden motivation is visibility. Publishing a code gives you visibility. Two weeks after making the GitHub repository public, I was contacted by the DLR (German Space Centre) because they work on something similar and showed interest in my work. I think this is a very concrete and positive outcome of making the code available.


Links:


Authorship Notes:
This interview with Guillaume Broggi was conducted in December 2022. The article was prepared and published by Miriam Braskova. Guillaume Broggi made only minor changes during the editing process.