Data policies ‒ Library ‐ EPFL

Before research

During research

After research

Data policies

Data services, expertise and training

Tools and Guides

Research data is easily lost in this jungle of policies and guidelines. On this page, we explain these topics and propose tools to make data management compliant with the various regulations.

At the end of a project, or at the beginning of a new one, data are made available for discovery and reuse.

The SNSF requires that researchers share at least the data underlying their publications, as soon as possible, but at the latest together with the relevant scientific publication. The researchers are expected to share their data according to the FAIR Data Principles on publicly accessible, digital, and non-commercial repositories (check the 2016 Nature publication here). Of course, this requirement has to be respected as long as no ethical restrictions apply to it.

FAIR

F – Findable: Data and metadata are easy to find by both humans and computers. Machine-readable metadata is essential for the automatic discovery of relevant datasets and services, which is essential to the FAIRification process.

A – Accessible: Limitations on the use of data, and protocols for querying or copying data are made explicit for both humans and machines.

I – Interoperable: The computer can interpret the data so that they can be automatically combined with other data. There is a historical trend in computer science toward increased interoperation (for instance, between different hardware designs, operating systems, programming languages, and communication protocols).

R – Reusable: Data and metadata are sufficiently well described for both humans and computers in order for them to be replicated or combined in future research.

Open research data at the EPFL

Open research data (ORD) is the part of research data (including code) that can be freely accessed, used, modified, and shared, provided that there is appropriate acknowledgment if required (source: Concordat on Open Research Data, UK).

In 2021, the Swiss National Strategy Open Research Data was introduced, leading to the adoption of the ORD Strategy Action Plan in January 2022 by Swiss scientific institutions. Within the ETH Domain, a Program with five Measures for their researchers and institutions was established, with the broader aim of advancing Open Science.

As a matter of fact, not all research data can be open and it is commonly recognized that access may need to be restricted in order to maintain confidentiality, guard against unreasonable costs, protect individuals’ privacy, respect consent terms, raise possible intellectual property issues, as well as manage security or other risks.

There is currently no formal ORD policy at EPFL. However, there is an EPFL Privacy Policy that provides general information applicable to personal data in most situations and may be supplemented with more specific notices or regulations whenever applicable. Moreover, most researchs project at EPFL are already regulated by their funders: among them, SNSF and the European Research Council (ERC) expect that researchers disseminate the data and code underlying the published results.

Data and code licenses

Data licenses: the use of acknowledged data licenses implies a clear definition of what users may or may not do with a dataset. Notably Creative Commons licenses (CC-BY, CC0, …) allow to give or retain various rights on datasets. They are relatively easy to understand, and at the same time, legally well-defined and machine-readable.

Code licenses: for computer code and software, the following licenses are to be considered: Apache, Berkeley Software Distribution (2 and 3 close BSD Licenses), GNU Public Licenses (GPL, LGPL, AGPL), and Public Domain.

Personal data protection and data masking

Personal data is all information related to an identified or identifiable person. Handling such data requires special precautions to comply with the law. To have more comprehensive information on personal data protection, go to the EPFL Personal Data Protection page.

Data masking, or data obfuscation, is the process of hiding original data with modified content. Data can be anonymized or pseudonymized.

Contact

[email protected]

+41 21 693 21 56