Entry point and coordination among the other contacts: [email protected]
- Research data
- Research data (RD), including code, is defined as evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical)
(Concordat on Open Research Data)
- Open research data
- Open research data are one part of RD that can be freely accessed, used, modified, and shared, provided that there is appropriate acknowledgement if required
(Concordat on Open Research Data)
As a matter of fact, not all research data can be open and it is commonly recognized that access may need to be managed in order to maintain confidentiality, guard against unreasonable costs, protect individuals’ privacy, respect consent terms, raise possible intellectual property issues, as well as manage security or other risks.
Find here the definitions and examples for the main types of platforms used to disseminate/analyze/preserve research data or code. Definitions and categories can overlap, or be differently perceived depending on the research area or personal experience.
- In short: Commercial or institutional online platforms allowing to store, version, publish, retrieve and collaborate on active code
- Extended: Online platform that simplifies the version control of code, allowing developers to submit patches of code in an organized, collaborative manner. It can also be used for backup of the code. Sometimes it acts as a web hosting service where developers store their source code for web pages. Source code repository software services and websites often support bug tracking, mailing lists, version control, documentation, and release management. Developers generally retain their copyright when software is posted to a code hosting facility and can make it publicly accessible.
- Ex.: GitHub, c4science, GitLab EPFL, etc.
- Not for: Code repositories are not usually designed for data/code preservation nor for data analysis purposes.
- In short: Institutional or commercial online platform allowing to aggregate, combine, analyze and visualize active datasets (data and/or code)
- Extended: Online platform used to perform analysis, retrieve, combine, interact with, explore, and visualize data. The underlying data can originate from various sources (e.g. data banks, user-provided, API, etc.), online or offline. The main scope of a data analysis platform is to turn every data into actionable insights, other than overcoming possible inadequacies of a relational database or table. As these platforms embed the tools for accomplishing meaningful data transformations, they are usually focused on specific research domains, and use consequent formats, algorithms, and processes. Others, especially the ones used for code scripts, can be more generic.
- Ex.: Renku, ELN EPFL, Open Data Cube, etc.
- Not for: Data analysis platforms are not usually designed for data preservation or for data publication purposes.
- In short: Mostly institutional online platforms allowing to preserve and retrieve cold data
- Extended: Storage infrastructure enabling the long-term retention and reusability of data. Provides secure, redundant locations to securely store data for future (re)use. Once in the archived data management system, the data stays accessible and the system protects its integrity. A data archive is a place to store data that is deemed important, usually curated data, but that doesn’t need to be accessed or modified frequently (if at all). Most institutions use data archives for legacy data, or data of certain value, or to meet regulatory standards. Data are not usually findable by external users.
- Ex.: ACOUA, ETH Data Archive, OLOS, etc.
- Not for: Data archives are not usually designed for data publication nor for data analysis purposes.
- In short: Institutional or commercial online platforms allowing to store, aggregate, discover and retrieve cold data
- Extended: Collection of datasets for secondary use in research that allows processing many continual queries over a long period of time. Not to be confused with organizations concerned with constructing and maintaining such databases. Services and tools can be built on top of such a database, for development, compilation, testing, verification, dissemination, analysis and visualization of data. Normally part of a larger institution (academic, corporate, scientific, medical, governmental, etc.) is established to serve the data to users of that organization. A data bank may also maintain subscriptions to licensed data resources for its users to access the information.
- Ex.: DBnomics, Channelpedia, DataLakes, etc.
- Not for: Data banks are not usually designed for data preservation or for data analysis purposes.
- In short: Institutional or commercial online platforms allowing to preserve, publish, discover and retrieve cold datasets (data and/or code)
- Extended def.: Online infrastructure where researchers can submit their data. Allows to manage, share, and access datasets for the long term. Can be generic (discipline-agnostic) or specialized (discipline-specific): specialized data repositories often integrate tools and services useful for a discipline. The main purpose is to preserve data to make it reusable for future research. Data repositories may have specific requirements concerning: research domain; data reuse and access; file format; data structure; types of metadata. They can also have restrictions on who can deposit data, based on: funding; academic qualification; quality of data. Institutional data repositories may collect a university or consortium of universities researchers’ data.
- Ex.: Zenodo, Materials Cloud, Dryad, etc.
- Not for: Data repositories are not usually designed for data analysis nor for code versioning purposes.
EPFL and open research data
At EPFL, open research data (ORD) and its appropriate management are considered good practices in the research process (Compliance Guide).
Before the project
By applying the open-data-by-design-methodology before starting a research project, consider how much of your data will be made openly accessible.
By integrating this aspect in your data management plan(ning) as a starting point of your research project, you will be able to define tools and methodologies to ensure that the data and code you want to openly share will be understandable, findable, accessible and reusable.
The Library Research Data Management team is at your disposal to guide you through this:
- Contact us: [email protected]
- Check out our dedicated pages and our Fast Guide on Data Management Plan (DMP). To check how much your data management will cost, use our cost calculator (and feel free to contribute to it here)
If you manage the following types of data, consider contacting also the following EPFL services:
- Personal or otherwise sensitive data: [email protected] (Data Protection Officer) and [email protected] (Research ethics)
- Data linked to industry: [email protected] (Technology Transfer Office)
In general, contact the Research Office for help with project management.
During a project
When you start with your research project, you will need to produce data, or rely on already existing data.
If you need storage capacity, the various Faculty-ITs can support you:
- ENAC-IT: epfl.ch/schools/enac/fr/a-propos/enac-it/
- SV-IT: epfl.ch/schools/sv/it (in the case of sensitive data, check secure data acquisition at https://redcap.epfl.ch)
- STI-IT: epfl.ch/schools/sti/it/
- SB-IT: https://sb-it.epfl.ch/ (currently not available)
- IC-IT: https://www.epfl.ch/schools/ic/it/en/it-service-ic-it/
- CDM-IT: https://www.epfl.ch/schools/cdm/college-of-management-of-technology/about/internal-services/it-services/it-administration
If you already have or want to acquire existing data or code, but have doubts about the legal or technical possibility of reusing it, make sure to respect the licenses they come with: check the Fast Guide on Data & Code Licensing or contact [email protected].
If you need help with contracts for 3rd party data, please contact [email protected].
Data and code workflow
EPFL Library supports you in documenting your research data workflow or in implementing a metadata standard: contact [email protected].
Here follows a list of tools and services to improve your workflow, by also storing, sharing and implementing the right documentation all along.
For storage questions, please refer to your Faculty-IT [see contacts above, section “Data Acquisition”].
For general storage questions, use File Storage, the central storage and backup service by EPFL VPO. It also offers an “object storage” hosted on-site and based on Open Standard S3 protocol: use the XaaS portal to request for buckets.
– protocols.io: A platform for collaboratively developing, storing, organizing and searching reproducible methods, procedures, manuals, protocols, etc. and also publishing them with a DOI.
– c4science: A platform for scientific code development. Integrated with GitHub, it allows version control and easy collaboration, with accessibility to external collaborators while using EPFL storage.
– ELN EPFL: This chemistry-oriented Electronic Lab Notebook (ELN) offers tools for visualization and analysis, converting files to standardized, open formats. It is also a repository for spectroscopic data.
– SLIMS: Life-science-oriented Laboratory Information Management System (LIMS), installed on EPFL servers, that offers ELN functionalities plus the management of samples.
– Druva Insync: Centralized back-up and synchronization system maintained by EPFL VPO, allowing the automatic backup of user data on their PCs.
Data repositories can be institutional, disciplinary or multi-disciplinary (some of them are commercial ones, owned by big publishers).
In order to choose the one that best fits your needs, we suggest considering the following elements:
- Check if you have to be compliant with any institutional or funder’s requirement, and follow the provided recommendations (for example, the SNSF and the European Commission give instructions and advice on the repository choice);
- Be careful with your disciplinary data sharing practices and consider choosing a repository accordingly to better target your peers;
- Choose a repository that provides DOI for your datasets;
- Check if the repository you choose is compliant with any data protection law you have to follow.
- Choose the correct license to specify the possibilities and limits for others to reuse your own work: check the EPFL Data publication decision tree or the Fast Guide on Data & Code Licensing, or contact [email protected].
At EPFL, there is currently no institutional generic data repository, but there is the option to publish your research data in the EPFL Zenodo Community. For more field-specific research data and code, you might want to check out other solutions, such as the Materials Cloud data repository for computational materials.
If you want to highlight the importance of your data and be citable for this publication, you might also consider writing a data paper, i.e. a peer-reviewed paper describing a dataset as research output.
+41 21 693 21 56