At EPFL, the library’s research data management team is available to provide expert advice, support and solutions. FAIR, Open research data, Licensing, Data Anonymization, … Research data is easily lost in this jungle of policies and guidelines. On this page, we explain these topics and propose tools to make data management compliant with the various regulations.
At the end of a project, or at the beginning of a new one, data are made available for discovery and reuse.
The SNSF requires that researchers share at least the data underlying their publications, as soon as possible, but at the latest together with the relevant scientific publication. The researchers are expected to share their data according to the FAIR Data Principles on publicly accessible, digital, and non-commercial repositories (check the 2016 Nature publication here). Of course, this requirement has to be respected as long as no ethical restrictions apply to it.
1 | FAIR: what does it stand for?
- F – Findable: Data and metadata are easy to find by both humans and computers. Machine-readable metadata is essential for the automatic discovery of relevant datasets and services, which is essential to the FAIRification process.
- A – Accessible: Limitations on the use of data, and protocols for querying or copying data are made explicit for both humans and machines.
- I – Interoperable: The computer can interpret the data so that they can be automatically combined with other data. There is a historical trend in computer science toward increased interoperation (for instance, between different hardware designs, operating systems, programming languages, and communication protocols).
- R – Reusable: Data and metadata are sufficiently well described for both humans and computers in order for them to be replicated or combined in future research.
2 | Open research data at the EPFL
Open research data (ORD) is the part of research data (including code) that can be freely accessed, used, modified, and shared, provided that there is appropriate acknowledgment if required (source: Concordat on Open Research Data).
As a matter of fact, not all research data can be open and it is commonly recognized that access may need to be managed in order to maintain confidentiality, guard against unreasonable costs, protect individuals’ privacy, respect consent terms, raise possible intellectual property issues, as well as manage security or other risks.
3 | Data and code licenses
Data licenses: the use of acknowledged data licenses implies a clear definition of what users may or may not do with a dataset. Notably Creative Commons licenses (CC-BY, CC0, …) allow to give or retain various rights on datasets. They are relatively easy to understand, and at the same time, legally well-defined and machine-readable.
Code licenses: for computer code and software, the following licenses are to be considered: Apache, Berkeley Software Distribution (2 and 3 close BSD Licenses), GNU Public Licenses (GPL, LGPL, AGPL), and Public Domain.
4 | Personal data protection and data masking
Personal data is all information related to an identified or identifiable person. Handling such data requires special precautions to comply with the law. To have more comprehensive information on personal data protection, go to the EPFL Personal Data Protection page.
Data masking, or data obfuscation, is the process of hiding original data with modified content. Data can be anonymized or pseudonymized.
Virtual assistant and decision tree that allows researchers to get help in the most relevant legal issues in research data management. Developed by the collaboration of USI, eLab, CCdigitallaw and UNINE, it is open source. You can navigate copyright or data protection issues, and obtain information, guidance, or useful links.
You can search, browse, or even submit software licenses: everything is explained in plain english. It allows to get information on licenses for datasets and especially code, thus helping to decide if some previous work can be reused or reshared, or to choose a license for publishing code and data.
A free, open-source, cross-platform desktop client to anonymize your datasets. Easy to use, you can anonymize datasets via configurable, pre-made algorithms, or create and add your own custom scripts. Your datasets are stored and processed locally.
A free, full production, easy-to-install open-source solution for the collection and management of sensitive personal and health data.
Unsure about how to license your data for publishing or sharing? Check out the simple data publication decision tree.
Simple chooser to help determine which Creative Commons License is right for your data.