Open research data and open code support

Main contact

Entry point and coordination among the other contacts: [email protected]

Some definitions

Research data
Research data (RD), including code, is defined as evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical)
(Concordat on Open Research Data)
Open research data
Open research data are one part of RD that can be freely accessed, used, modified, and shared, provided that there is appropriate acknowledgement if required
(Concordat on Open Research Data)

As a matter of fact, not all research data can be open and it is commonly recognized that access may need to be managed in order to maintain confidentiality, guard against unreasonable costs, protect individuals’ privacy, respect consent terms, raise possible intellectual property issues, as well as manage security or other risks.

Data Platforms

Find here the definitions and examples for the main types of platforms used to disseminate/analyze/preserve research data or code. Definitions and categories can overlap, or be differently perceived depending on the research area or personal experience.

  • In short: Commercial or institutional online platforms allowing to store, version, publish, retrieve and collaborate on active code

  • Extended: Online platform that simplifies the version control of code, allowing developers to submit patches of code in an organized, collaborative manner. It can also be used for backup of the code. Sometimes it acts as a web hosting service where developers store their source code for web pages. Source code repository software services and websites often support bug tracking, mailing lists, version control, documentation, and release management. Developers generally retain their copyright when software is posted to a code hosting facility and can make it publicly accessible. 

  • Ex.: GitHub, c4science, GitLab EPFL, etc.

  • Not for: Code repositories are not usually designed for data/code preservation nor for data analysis purposes.
  • In short: Institutional or commercial online platform allowing to aggregate, combine, analyze and visualize active datasets (data and/or code)

  • Extended: Online platform used to perform analysis, retrieve, combine, interact with, explore, and visualize data. The underlying data can originate from various sources (e.g. data banks, user-provided, API, etc.), online or offline. The main scope of a data analysis platform is to turn every data into actionable insights, other than overcoming possible inadequacies of a relational database or table. As these platforms embed the tools for accomplishing meaningful data transformations, they are usually focused on specific research domains, and use consequent formats, algorithms, and processes. Others, especially the ones used for code scripts, can be more generic.

  • Ex.: Renku, ELN EPFL, Open Data Cube, etc.

  • Not for: Data analysis platforms are not usually designed for data preservation or for data publication purposes.
  • In short: Mostly institutional online platforms allowing to preserve and retrieve cold data

  • Extended: Storage infrastructure enabling the long-term retention and reusability of data. Provides secure, redundant locations to securely store data for future (re)use. Once in the archived data management system, the data stays accessible and the system protects its integrity. A data archive is a place to store data that is deemed important, usually curated data, but that doesn’t need to be accessed or modified frequently (if at all). Most institutions use data archives for legacy data, or data of certain value, or to meet regulatory standards. Data are not usually findable by external users.

  • Ex.: ACOUA, ETH Data Archive, OLOS, etc.

  • Not for: Data archives are not usually designed for data publication nor for data analysis purposes.
  • In short: Institutional or commercial online platforms allowing to store, aggregate, discover and retrieve cold data

  • Extended:  Collection of datasets for secondary use in research that allows processing many continual queries over a long period of time. Not to be confused with organizations concerned with constructing and maintaining such databases. Services and tools can be built on top of such a database, for development, compilation, testing, verification, dissemination, analysis and visualization of data. Normally part of a larger institution (academic, corporate, scientific, medical, governmental, etc.) is established to serve the data to users of that organization. A data bank may also maintain subscriptions to licensed data resources for its users to access the information.

  • Ex.: DBnomics, Channelpedia, DataLakes, etc.

  • Not for: Data banks are not usually designed for data preservation or for data analysis purposes.
  • In short: Institutional or commercial online platforms allowing to preserve, publish, discover and retrieve cold datasets (data and/or code)

  • Extended def.: Online infrastructure where researchers can submit their data. Allows to manage, share, and access datasets for the long term. Can be generic (discipline-agnostic) or specialized (discipline-specific): specialized data repositories often integrate tools and services useful for a discipline. The main purpose is to preserve data to make it reusable for future research. Data repositories may have specific requirements concerning: research domain; data reuse and access; file format; data structure; types of metadata. They can also have restrictions on who can deposit data, based on: funding; academic qualification; quality of data. Institutional data repositories may collect a university or consortium of universities researchers’ data.

  • Ex.Zenodo, Materials Cloud, Dryad, etc.

  • Not for: Data repositories are not usually designed for data analysis nor for code versioning purposes.

EPFL and open research data

At EPFL, open research data (ORD) and its appropriate management are considered good practices in the research process (Compliance Guide).

“As open as possible as closed as necessary” is the main EPFL (and beyond) guiding principle when it comes to open data (i.e. Horizon 2020, OpenAIRE, Horizon Europe, etc.).

At EPFL, there is a Privacy Policy that provides general information applicable for personal data in most situations and may be supplemented with more specific notices or regulations whenever applicable. There is currently no formal ORD policy at EPFL.

Before the project

By applying the open-data-by-design-methodology before starting a research project, consider how much of your data will be made openly accessible.

By integrating this aspect in your data management plan(ning) as a starting point of your research project, you will be able to define tools and methodologies to ensure that the data and code you want to openly share will be understandable, findable, accessible and reusable.

The Library Research Data Management team is at your disposal to guide you through this:

If you manage the following types of data, consider contacting also the following EPFL services:

In general, contact the Research Office for help with project management.

During a project

When you start with your research project, you will need to produce data, or rely on already existing data.

If you need storage capacity, the various Faculty-ITs can support you:

To find a relevant repository to find existing data and code, search www.re3data.org or contact [email protected].

If you already have or want to acquire existing data or code, but have doubts about the legal or technical possibility of reusing it, make sure to respect the licenses they come with: check the Fast Guide on Data & Code Licensing or contact [email protected].

If you need help with contracts for 3rd party data, please contact [email protected].

Data and code workflow

Data and code workflow of your research are as important as their results, to make them reproducible and FAIR compliant: check the Fast Guide on FAIR data principles.

EPFL Library supports you in documenting your research data workflow or in implementing a metadata standard: contact [email protected].

Here follows a list of tools and services to improve your workflow, by also storing, sharing and implementing the right documentation all along.

For storage questions, please refer to your Faculty-IT [see contacts above, section “Data Acquisition”].

For general storage questions, use File Storage, the central storage and backup service by EPFL VPO. It also offers an “object storage” hosted on-site and based on Open Standard S3 protocol: use the XaaS portal to request for buckets.

RENKU: A software platform by SDSC that enables reproducible and collaborative data science, with reproducible analyses and automatic generation of Knowledge Graphs.

AiiDA: Python infrastructure, open-source, that helps in automating, managing, capturing, sharing and reproducing even complex data workflows in computational science.

protocols.io: A platform for collaboratively developing, storing, organizing and searching reproducible methods, procedures, manuals, protocols, etc. and also publishing them with a DOI.

GitLab EPFL: An open alternative to GitHub, the GitLab Community Edition of EPFL allows to use the GitLab functionalities of version control, etc., within the comfort of the EPFL access.

c4science: A platform for scientific code development. Integrated with GitHub, it allows version control and easy collaboration, with accessibility to external collaborators while using EPFL storage.

ELN EPFL: This chemistry-oriented Electronic Lab Notebook (ELN) offers tools for visualization and analysis, converting files to standardized, open formats. It is also a repository for spectroscopic data.

SLIMS: Life-science-oriented Laboratory Information Management System (LIMS), installed on EPFL servers, that offers ELN functionalities plus the management of samples.

Druva Insync: Centralized back-up and synchronization system maintained by EPFL VPO, allowing the automatic backup of user data on their PCs.

rsync: Open-source utility for synchronizing and transferring files across computer systems with minimal network usage. Also used by SCITAS. Check other synch tools in this comparison table.

SWITCHdrive: Cloud storage by www.switch.ch, the Swiss academic platform. It also integrates OnlyOffice, a free and open-source office suite for real-time collaboration on shared documents.

Data publication

To make your data or code openly available, deposit them in appropriate repositories, which preserve and provide access to this type of research output: see re3data.org or contact [email protected].

Data repositories can be institutional, disciplinary or multi-disciplinary (some of them are commercial ones, owned by big publishers).

In order to choose the one that best fits your needs, we suggest considering the following elements:

  • Check if you have to be compliant with any institutional or funder’s requirement, and follow the provided recommendations (for example, the SNSF and the European Commission give instructions and advice on the repository choice);
  • Be careful with your disciplinary data sharing practices and consider choosing a repository accordingly to better target your peers;
  • Choose a repository that provides DOI for your datasets;
  • Check if the repository you choose is compliant with any data protection law you have to follow.
  • Choose the correct license to specify the possibilities and limits for others to reuse your own work: check the EPFL Data publication decision tree or the Fast Guide on Data & Code Licensing, or contact [email protected].

At EPFL, there is currently no institutional generic data repository, but there is the option to publish your research data in the EPFL Zenodo Community. For more field-specific research data and code, you might want to check out other solutions, such as the Materials Cloud data repository for computational materials.

If you want to highlight the importance of your data and be citable for this publication, you might also consider writing a data paper, i.e. a peer-reviewed paper describing a dataset as research output.

Data archiving

At EPFL, there is the option to archive your research data in ACOUA, the EPFL Academic Output Archive, and publish it automatically to Zenodo. Please, contact [email protected].

Contact

[email protected]


+41 21 693 21 56


Access map