EPFL Data Champions

The EPFL cross-disciplinary community around research data.

Learn more about the EPFL Data Champions community! Join it or contact [email protected] for further information.

Choose and contact a Data Champion from the list. Not sure who to ask? Then, simply contact [email protected].

Discover useful material, contacts, a FAQ, links and documents. Want to share your own work with the EPFL Data Champions? Just tell us at [email protected].

 

EPFL Data Champions

Managing data is often a fragmented, frustrating experience on the top of research activities … But data generation, analysis, visualization, sharing, etc. greatly affect research projects. Ask an EPFL Data Champion.


Giovanna Ambrosini
• Bioinformatics
• Genomics
• Web development


Maroun Bou Sleiman
• Bioinformatics
• Genetics
• Omics


Philipp Bucher
• Genomics
• Bioinformatics
• Epigenetics

Christine Choirat
• Health data science
• Reproducible research


Florent-Valéry Coen
• Microfabrication
• Neuroengineering
• Stretchable electronics


Valentin Conrad
• Data protection
• Contract law
• Intellectual property


Valeria Di Cola
• R
• Data management
• Data visualization


Tom de Geus
• Mechanics
• Comp. Physics
• Stat. Physics


Simon Dürr
• Molecular Dynamics Simulations
• Quantum Chemistry
• Genetic Algorithms


Brad Fetter
• Bibliometrics
• Data visualisation
• Research indicators


Shirah Foy
• Interviewing
• Process journal
• Qualitative analysis


Fotis Georgatos
• System engineering
• Data science
• High energy physics


Kevin Jablonka
• Simulations
• Thermodynamics
• Communication

Emma Jablonski
• Data science engineering
• Reproducibility


Graham Knott
• Electron microscopy
• Neuroscience
• 3D modelling


Shin Koseki
• Urban data science
• Computational social science
• Human geography


Robert Lieck
• Data science
• Machine learning
• Digital humanities


Maud Ehrmann
• Natural Language Processing
• Large-scale textual data
• Digital Humanities

Charlotte Mazel-Cabasse
• Digital Humanities
• Project management


Fabian Moss
• Digital musicology
• Data visualization
• Data science


Luc Patiny
• Information management
• Chemistry
• ELN


Nicolas Argento
• ELN/LIMS services manager
• IT/Life sciences communication
• IT services management


Evarist Planet
• Data science
• Statistics
• Bioinformatics


Francisco Ramirez
• Numerical simulations
• Material physics

Alexis Rapin
• Bioinformatics
• Metagenomics


Martin Telefont
• Dataviz
• Big data
• Data curation


Ivica Zivkovic
• Physics
• Quantum magnetism
• FAIR principles


Birgit Schaffhauser
• Molecular pathology
• Project management
• Gene expression


Jenny Thomas
• Polar data management
• Databases
• Metadata



Ayah Khubieh
• Neuroscience
• Computer simulations


Amir Rezaie
• Structural Engineering
• Python
• Machine Learning


Lili Gu
• Mechanical Design
• Experiment modeling
• Discretization methods


Jessica Pidoux
• Personal data
• Online dating
• Sociology

Former EPFL Data Champions

As careers in science or elsewhere continue, the Data Champions community recognize the people who made it thrive, by helping others managing their data, sharing their expertise and passion within the EPFL and beyond.



Jonathan Cottet
• Microsystems
• Electric field
• Database


Orion Penner
• Innovation
• IP Policy
• Scientific Publishers


Sonia Ben Hamida
• Space
• Aerospace
• Design


Dasaraden Mauree
• Data analysis
• Data management
• Data visualization
 

Community of practice and interest

■ Your colleagues come to you for tips and tricks on how to manage their data? Would you like support and recognition for your help?

■ Are you enthusiastic about data sharing, visualization, anonymization, or publishing? Do they appeal to you beyond your research?

■ Do you wish for a bottom-up cultural shift regarding data management? Are you intrigued by meeting others that feel the same?

If you answered ‘yes’ to any of the above, you should join the EPFL Data Champions community!

Interested? Fill-in the form or contact [email protected].

Whether you are an EPFL researcher (PhD student , postdoc, professor, etc.) or staff (admin, technician, etc.) with a keen interest in research data and willing to share your expertise, the EPFL Data Champion community is for you!

Of course, previous data management experience, along with a little programming or communication skills are great advantages… but we believe in diversity!
Whatever your level of expertise in data science, data management, data visualization, etc. we would love to have you on board.

This is a voluntary engagement and we will encourage Data Champions to invest only the time they think they can dedicate to helping others, no more, no less.

EPFL Data Champions will get a chance to play the following roles, according to their personal availability and field experience:

  • Advise researchers on data handling or redirect them to expert support on Campus
  • Act as spokespersons in your faculty
  • Endorse the FAIR (Findable, Accessible, Interoperable, and Re-usable) principles
  • Promote (or participate in) training or events
  • Develop or promote RDM tools useful for research
  • Participate to possible publications on RDM

Don’t miss the opportunity to meet other EPFL Data Champions and interesting guest speakers at the Data Champions Meetings, meetings organized 3 times per year.

Data Champions invest some of their time to help others and this effort must be rewarded. We believe members of the EPFL DC community will enjoy the following personal benefits:

Increase your impact

  • Help people!
  • Be the spokesperson and local data expert, reporting to EPFL management about the needs of your faculty or research community
  • Gain visibility with your personal profile on the Data Champions webpage

Develop your professional network

  • Get in touch with others interested in RDM at the EPFL and outside
  • Network with researchers in workshops, conferences and events
  • Receive news and get to know EPFL services as ReO, TTO, Library, etc.

Learn new skills

  • Attend workshops or conferences on data science and data management
  • Communicate effectively, enhance presentation skills, lead workshops
  • Learn by doing, collaborate to projects for possible development or acquisition of data tools

Boost your CV

  • Distinguish yourself with qualifying activities and transferable skills
  • Add “EPFL Data Champion” to your personal profile page
  • Receive news on career opportunities related to RDM

The Research Data Library Team supports the DCs and

  • Creates and maintains this webpage
  • Invites the DCs to RDM training opportunities, both introductory and specialized
  • Provides basic material and IT resources for the DCs activities
  • Support the DCs in responding to the requests they might have
  • Organizes the DCs Meetings for the community 3 times per year
  • Generates and diffuses reports on DCs’ activities
  • Sends to the DCs a monthly newsletter to report on the DCs’ activities (via a short survey), inform on new tools, career opportunities, and invite the DCs to talks, events or publications about RDM

An EPFL Data Champion is not expected to replace other professional roles (e.g. Ethics Committee, IT staff, etc.), nor is held responsible for the consequences of advices to researchers.

The Research Data Library Team supports the fonctionning of the EPFL Data Champions community. The EPFL Library provides the financial support. Other institutional support also comes from the EPFL Open Science Strategic Committee.

The EPFL is not the first and it’s not alone in this initiative, although it takes some pioneering spirit. We acknowledge the work and stimulating discussions with colleagues and fellow Data Champions at Cambridge University and TU Delft.

The EPFL Library is also a member of SPARC Europe, whose similar initiative more focused on OA, the Open Access Champions, connects people from different countries.

These are just a few examples and, ultimately, the EPFL Data Champions are part of a bigger community and we look positively forward to creating synergies, and sharing ideas and practices around data.

 

Resources

Contacts

Q&A

Not to be confused with a Data management platform, a Data Management Plan (DMP) is a written document describing how data of a research project is managed during its life-cycle. The DMP covers the processes relative to the collection, analysis, transformation, publication, preservation of data and code of a single project. For multiple projects, you might want to devise a RDM Strategy instead: in case, please contact [email protected].

A DMP is as of now demanded by all major research funders such as SNSF or the EC (H2020, MSCA, etc.). It can also be asked by EPFL ReO in order to assess the correct manipulation of sensitive data, or by the EPFL TTO to document the data management bringing to a patent or industrial collaboration.

In general, a substantial increase in science reproducibility is pursued as a result of this strategic document and its implementation. Its main specific purposes are: 

  • to be more transparent; 
  • to comply with research funders; 
  • to forecast resources and needs; 
  • to help avoid data loss;
  • to improve accountability for data workflow;
  • to use data in a future-proof way.

Depending on the complexity, purpose and field of a project, it might take anything between 10 minutes to a few workdays to implement a DMP. This might also depend on the pre-existent familiarity with some Research Data Management (RDM) topics, as well as on the personal or team organization. 

For a review of your DMP or for help in implementing it, do not hesitate to contact [email protected].

Also, check out the FastGuides on DMP – Data Management Plan and on Storage, Publication and Preservation.

With its various sites and many schools and affiliated institutes, there are many possible storage options at the EPFL. 

The principal one is offered by the VPSI, with file storage services and backup for individual workstations. The VPSI also offers an on-premise “object storage” service (hosted on site) based on the Amazon S3 protocol and based on a Scality (software) and Cisco (hardware) infrastructure: use the XaaS portal to request for buckets.

The IT within different faculties might also offer more customized storage options, as NAS, for your faculty (as for ex. in STI).

You can have a larger view and find general information on the EPFL offerings at https://support.epfl.ch/epfl.

The national platform for academic use, www.switch.ch, offers cloud storage services such as SWITCHdrive and many others. 

Even if deprecated for all sensitive, commercial or any valuable data, the EPFL has also a Google Drive cloud solution (servers are not at EPFL). 

For coders, SCITAS offers work storage and the c4science for code repository similar to GitHub (and using the GitHub API for integration with this platform). RENKU is a collaborative data science platform that can be used for managing data, code and code environments. For more educational purposes, you might also want to try NOTO, a JupyterLab platform conceived to test and train with coding in Python, Bash, Octave, C, R, plus other useful features, all in the cloud.

Yes. In particular, you can check the upcoming training opportunities at go.epfl.ch/rdm-training

The RDM training catalog spans from a crash course to more specialized ones, and with different time lengths to accommodate each one’s agenda.

You can also Book a data librarian or simply contact [email protected] and ask for the next scheduled training opportunities.

The EPFL and SWITCH provide a certain choice of communication tools. For research groups, and depending on the particular needs and sensitivity of discussion and information transmitted, these are the main communication tools for videoconferencing: 

Of course, many other communication tools and platforms exist beyond the ones supported by the EPFL. In particular, if one does not need the videoconferencing capability, many groups would benefit from free messaging platforms as 

At EPFL there is no central department focusing on data analysis or processing, even though  Swiss Data Science Center (SDSC) might be a close match. Moreover, it fundamentally depends on the specific needs and field. Don’t hesitate to contact EPFL Research Data Library Team or the EPFL Data Champions community to explore possible solutions.

Moreover, we suggest to get in touch with the C4Science team, the various ICT centers, the CECAM, etc. for possible solutions, depending on the needs.

At the EPFL there are two main services that can accompany you on the matter of licensing, on which license to adopt as well as how to reuse the data that come with (or without) a certain license. While the Technology Transfer Office (TTO) can help especially if the licensing is linked to contracts or intellectual property issues (as patents), for all other inquiries on data or code licensing you can contact EPFL Research Data Library Team. Don’t hesitate to ask a Data Champion with your interests about possible first-hand experience with licensing issues.

BTW, you might want to take a look at the FastGuide Data & Code Licensing for a short overview and first (fast) guide.

While some platforms for code preservation as Software Heritage offer some clear guidelines on the licensing of code, others as GitHub or Zenodo allow you to choose a license from a vast selection. To know more, check out the FastGuide Data & Code Licensing. In general, while all digital work created at EPFL is owned by the EPFL, the authors might decide for its possible exploitation and use, and the license to attach to it. Of course, this freedom of choice is limited by possible funding agencies (ex. SNSF and EC ask to justify when deciding not to make your work openly available and check for a consistent licensing). Another limit can come by using code derived by a 3rd party, as well as when a contract exists, for instance, between two collaborating research groups or institutes.

Not yet. The EPFL Library and the CPSI are working to make one available to the EPFL community as soon as possible!

In the meantime, even if not optimized for data archiving, you might want to use Infoscience to deposit datasets, or deposit datasets on Zenodo with the use of the EPFL Zenodo Community tag, as Zenodo is better equipped for long-term preservation purposes.

You might want to take a look at the Open Research Data: SNSF monitoring report 2017-2018 (Page 9) to discover other data repositories currently used by the Swiss researchers.

The choice of a data or code repository depends on different factors, such as collaboration or backup purposes during the research, or for publication or archiving purposes after the research. 

Moreover, the decision should account for the specific research field, both for targeting the right public and for possible specific features that the repository might offer.

Generally, one might want to check re3data.org, from subfield specific, to field specific, to more generic repositories. 

Be aware that some research funders such as the SNSF do not reimburse the deposit of data or code on repositories that are for-profit or that do not assign a PID (eg. DOI) such as FigShare or GitHub.

Examples of data repositories: 

Examples of code repositories: 

You might want to take a look at the Open Research Data: SNSF monitoring report 2017-2018 (Page 9) to discover other data repositories currently used by the Swiss researchers.

A first practical aspect is to verify the uniqueness of a dataset, as a Digital Object Identifier (DOI) is a particular case of Persistent IDentifier (PID).

Another reason would be to reduce the probability of link rot: some data or code repositories (as for instance GitHub) do not provide a PID, but a simple URL, which might change depending on many factors. A DOI persistence is instead guaranteed.

A third reason on why to use a DOI for every published is to allow for the citation metrics, as many citation tools use the DOI to track articles as well as datasets. Last but not least, it is the very first condition for findability of the F.A.I.R. dataset principles that “F1. (Meta)data are assigned a globally unique and persistent identifier”.

“One would think that the desire for high quality would motivate any researcher to implement good data management practices. That is not necessarily the case. Data Management also requires time and effort, which may compete with other research activities such as publishing. So at times there may be a trade-off between data management activities and other research activities. A researcher also needs the skills and tools to implement good data practices.” Research Data Management – A European Perspective (by Filip Kruse, Jesper Boserup Thestrup)

Here follows a list of some of the efforts and benefits to be estimated, both for the community and the single researchers:

Efforts:

  • Time subtracted from research activities
  • Maintenance of data management tools
  • Respect of common rules along the research life-cycle
  • Conversion of tools or formats in open alternatives
  • Data curation with quality control
  • Learn new management and technical skills
  • Adapt to new software and tools

Benefits

  • Simplified collaborations
  • Enhanced reuse of data for 
    • the researchers at a later time
    • new researchers entering a project
    • other researchers
  • Reduced time to data curation before publication
  • Avoid legal and monetary complaints especially for collaborations
  • Reduce the wasted storage and the impact on finances and environment
  • Learn new management and technical skills
  • Avoid being at the mercy of private companies for your own workflow

Datasets and procedures already compliant with FAIR principles and funders requests

Gdrive storage

go.epfl.ch/DCdrive (EPFL DCs access only)

Interesting reads

  • Creating a Community of Data Champions
    R. Higman, M. Teperek & D. Kingsley, International Journal of Digital Curation, Vol 12 No 2 (2017), DOI: 10.2218/ijdc.v12i2.562
  • Establishing, Developing, and Sustaining a Community of Data Champions
    J. L. Savage & L. Cadwallader, Data Science Journal, 18: 23, pp. 1–8. (2019), DOI: 10.5334/dsj-2019-023
  • How to build a community of Data Champions: Six Steps to Success
    C. Clare, Zenodo (2019), DOI: 10.5281/zenodo.3383814