Artificial intelligence and scientific information

This quick guide aims to help you navigate the ever-evolving landscape of responsible AI use during your studies at the EPFL. We will focus on LLMs – the common everyday ones like ChatGPT and Google’s Gemini, and special AI research assistants designed for various research tasks. We will show you some of the key principles to keep in mind when using LLMs, as well as the main drawbacks of LLMs to be mindful of.
As AI advances at a rapid pace, this page will be updated regularly. Watch this space!
Artificial Intelligence and scientific information

Image generated with ChatGPT 4.0.

Over the past few years, AI tools have changed the way we search for information, learn and create. Since it burst onto the scene in 2022, ChatGPT has amassed over 5 million users, and combined with other similarly popular Large Language Models (LLMs), it is clear that AI is here to stay.

Large Language Models (LLMs) are increasingly being integrated into library services and researchers’ practices due to their ability to summarize masses of information quickly, help brainstorm, and improve writing, all while sounding like a real human being. However, if you have ever tried using an LLM yourself, you have probably realized that they can be quite error-prone, generating information that sounds plausible but is not factually accurate. That comes down to their design: LLMs are essentially very sophisticated text prediction tools trained on unprecedented amounts of human-generated information. They were trained to predict the most likely continuation of a prompt, based on patterns in how us humans communicate. They were not designed to think like we do, to systematically correct us if we are wrong, or to “not know” the response to a prompt. More autocorrect on steroids than omnipotent intelligence. That is why LLMs should be used responsibly, as tools, with our critical thinking skills fully engaged.

Basic recommendations

Searching for Information with LLMs

Image generated with ChatGPT 4.0.

We said that Large Language Models (LMMs) are text predictors at their core. That means that, when we ask an LLM a question (this is our prompt), it chops the prompt up into words or sets of words (tokens) and calculates the statistical probability of the word or set of words that should come next. Since LLMs were trained to perform these calculations based on all the text on the internet (like news articles, but also comments on social media), the resulting answer to our question (the output) sounds like a real person who knows what they are saying. But in reality, LLMs do not understand the content of our prompts, nor of their outputs, like us humans do. Because of this statistical text prediction, the phrasing and sentence order we use in our prompt can drastically change the quality of the output that we get. And, because something can always come after a chunk of text, LLMs must provide an output to any prompt.

Knowing this, here are some key guidelines to remember when using LLMs:

Main principles

  • LLMs are tools, not reliable sources of information: they use statistical patterns, and do not have a real understanding of the subject.
  • LLMs can “hallucinate”, that is, generate plausible-sounding nonsense, just to be able to reply to our prompts.
  • The quality of the prompt we give it can make the difference between useful and useless responses.

General use recommendations


  • Creating pictures
  • Remember to always inform the reader about the tool you used

  • Getting a basic idea about a new subject – always double check the information it gives you due to the tendency to hallucinate (also applies to bibliographic references)

  • Writing a paper from scratch – using AI to write on your behalf is ethically wrong: you are responsible for the work you submit
  • Using it as the only source of information
  • Citing its output in a paper – unreliable by design. Instead, track down trustworthy academic sources supporting the concept you want to reference
  • Using the references it provides without double-checking

Searching for Information with Research Assistants

Image generated with ChatGPT 4.0.

AI research assistants are also LLMs but focused on scientific research. That means that the body of information that they were trained on (their training corpus) is restricted to scientific databases and paper repositories, with the idea to help researchers with specific tasks. Some of these tasks include discovering literature in a given research domain, helping with literature reviews, and with refining research questions – tasks which may come up in your studies as well, whether you are a researcher or not.

As with any LLM, the effectiveness of AI research assistants heavily depends on the quality of our prompts, and our critical thinking ability around its outputs. However, apart from the general guidelines in the section above, some extra points should be taken into account when using AI research assistants:

Main principles

  • Whenever possible, check the corpus used to train the tool. The larger the training corpus, the larger the “knowledge” base (Semantic Scholar: around 200 million records, Web of Science around 83 million records, Scopus around 84 million records). However, size sometimes brings a lack of quality control.
  • Each AI-powered Research Assistant works in a different way: try to assess what the tool can and cannot do for you (more in The Tools section below).

AI in Academic Writing

Image generated with ChatGPT 4.0.

If you are using LLMs, general-purpose ones or AI research assistants, to help you write a scholarly manuscript intended for publication, there are two important rules to follow on top of those discussed above:

  • LLMs do not have the ability to be accountable for the content they generate. Thus, they cannot be listed as authors/co-authors on a manuscript, according to Swiss law (Swiss Copyright Act, Chapter 2 Art. 6 – https://www.fedlex.admin.ch/eli/cc/1993/1798_1798_1798/en)
  • The LLM you used, what you used it for, and how, must be disclosed in the manuscript. Make sure you adhere to the policies on the use of AI-generated content of your publisher of choice (e.g. Nature Editorial Policies – AI).

Privacy protection and sensitive data

  • Personally identifiable information (e.g., names, addresses) or sensitive information (e.g., religious or political orientations) about people or institutions should never be disclosed in prompts
  • If you cannot avoid a prompt containing personally identifiable and/or sensitive information, consider anonymizing these parts of the prompt before submitting the prompt

Copyright infringement and plagiarism

  • Take the time to read the terms and conditions of the tool you’re using to understand who owns the copyright on the AI-generated outcome.
  • Pay extra attention to copyright infringement: e.g. asking an LLM to generate an image based on a screenshot from a movie, but in Studio Ghibli style, could lead to legal issues on both sides (the movie copyright owner’s side and Studio Ghibli’s side).
  • Plagiarism is an offense no matter if committed intentionally or not. Thus, avoid using AI-generated text word for word, and instead find other, credible sources for the claim in the output and cite those properly.

AI clauses in contracts with academic publishers

Image generated with ChatGPT 4.0.

In the current landscape of scholarly publishing, agreements with scientific publishers increasingly regulate the use of artificial intelligence in connection with licensed content. As AI tools become more integrated in research and publishing workflows, understanding the permitted and prohibited uses of the content covered by the agreements is essential. Properly navigating these terms ensures compliance and prevents potential legal issues.

This section will be updated each time a new agreement is signed.

“Open access publications with a CC-BY license can be used with any kind of available AI tool, may be used for the development and training of any AI tool and the results including the content can be freely shared.

Licensed content (closed publications) or Open Access publications with a restrictive CC-BY-NC-ND license, where Elsevier owns all rights or some key rights such as the right to create derivatives, may be used

  • with closed versions of AI tools that do not train the algorithm, do not learn from the input or incorporate the input in the AI tool (e.g. Open AI “non-learning” subscription version ChatGPT Team). The use of the closed version may be subject to a fee and / or restrictions.
  • with open versions of learning AI tools or to develop your own AI tool or platform, provided that it is used in a secure, user-controlled environment (i.e. self-hosted in an on-premises environment or in an environment hosted externally solely for use by Participating Institutions or Authorised Users).

Results generated or platforms developed with the help of AI may be published and made available for research and teaching, provided that they do not contain or reproduce content from closed or CC-BY-NC-ND publications. Links to closed publications are permitted. The commercial use of AI-generated content or platforms is prohibited.”

Source

The tools

Generative AI

General-purpose AI LLMs that can generate text, images, videos, code, assist with various tasks, and engage in conversations. These foundation models serve as the core technology behind many specialized AI applications.

https://chatgpt.com/

A versatile conversational AI assistant based on GPT models that can generate text, assist with writing, answer questions, and engage in natural conversations across a wide range of topics.

Business model: freemium (free version with GPT-3.5, ChatGPT Plus subscription for GPT-4 and additional features)

Recommended uses:

  • Content creation and brainstorming
  • Answering general knowledge questions
  • Assistance with coding and problem-solving
  • Educational explanations and tutoring

Strengths:

  • Versatile across many domains and types of queries
  • Large user base with extensive ecosystem of plugins and extensions
  • Regular model updates and improvements
  • Strong integration with other OpenAI products and services

Weaknesses:

  • May occasionally generate incorrect information
  • Limited knowledge cutoff date in free version
  • Performance varies based on prompt quality
  • Potential for overreliance on its outputs without verification

https://claude.ai

A conversational AI assistant capable of complex reasoning and nuanced responses.

Business model: freemium (free version with limitations, Claude Pro for advanced features)

Recommended uses:

  • Document analysis and summarization
  • Thoughtful conversations on complex topics
  • Writing assistance with nuanced content 

Strengths:

  • Strong reasoning capabilities
  • Excellent at understanding context and nuance

Weaknesses:

  • May be too cautious in certain content areas
  • Some advanced features only available in paid version

https://chat.deepseek.com/

An AI system focused on deep learning and delivering code understanding and generation capabilities, developed with a focus on programming tasks.

Business model: freemium (basic features free, premium features for subscribers)

Recommended uses:

  • Code generation and completion
  • Technical documentation assistance
  • Learning programming concepts

Strengths:

  • Specialized in code-related tasks
  • Strong performance in programming languages
  • Open-source model versions available

Weaknesses:

  • Less versatile for non-coding tasks
  • Newer to market with less established ecosystem
  • Censorship/privacy concerns

https://gemini.google.com/

Google’s multimodal AI model that can understand, operate across, and combine different types of information including text, code, audio, image, and video.

Business model: freemium (free access to Gemini with a Google account, paid Gemini Advanced)

Recommended uses:

  • Multimodal tasks involving different types of media
  • Creative content generation
  • Information synthesis
  • Programming assistance

Strengths:

  • Strong multimodal capabilities
  • Integration with Google ecosystem
  • Advanced reasoning abilities
  • Up-to-date information when using Google search

Weaknesses:

  • Varying performance across different tasks
  • Some advanced features restricted to paid tier
  • Privacy concerns with data handling (Google)
  • May prioritize Google services in recommendations

https://github.com/features/copilot

An AI programming assistant that suggests code and entire functions in real-time, directly in your editor, powered by OpenAI’s Codex model.

Business model: subscription-based (individual and business plans)

Recommended uses:

  • Accelerating coding tasks and reducing boilerplate
  • Learning new programming languages or frameworks
  • Debugging and code improvement suggestions

Strengths:

  • Understands context from surrounding code
  • Supports numerous programming languages
  • Real-time suggestions as you type

Weaknesses:

  • May suggest incorrect or inefficient code
  • Subscription cost for individual developers
  • Potential legal questions about training data
  • Relies on quality of existing code for context

https://www.llama.com/

An open-source large language model designed to be accessible for research and commercial applications with various parameter sizes.

Business model: open-source (free to use, adapt, and deploy)

Recommended uses:

  • Self-hosted AI solutions
  • Research and fine-tuning for specific domains
  • Applications requiring local deployment

Strengths:

  • Open-source flexibility
  • Various model sizes for different computing resources
  • No usage fees or API costs
  • Can be fine-tuned for specific use cases

Weaknesses:

  • Requires technical expertise to deploy effectively
  • Computing resources needed for larger models

https://lumo.proton.me

A conversational AI assistant powered by Proton’s suite of models that can generate text, answer questions, and engage in natural conversations.

Business model: freemium (Lumo Free at $0, Lumo Plus subscription for unlimited usage, web‑search, extended context, and premium features; Lumo Access included with Proton Visionary/Lifetime plans)

Recommended uses:

  • Content creation and brainstorming
  • Answering general knowledge questions
  • Assistance with coding and problem-solving
  • Educational explanations and tutoring

Strengths:

  • Greater privacy due to zero-access encryption and servers based in Switzerland
  • Prompts are not included in training data
  • Transparent security audits that let users verify privacy claims
  • Strong integration with other Proton services

Weaknesses:

  • May occasionally generate incorrect information
  • Limited plugin options
  • New capabilities (e.g., voice input, image generation) arrive more slowly because they must pass Proton’s rigorous privacy review

https://copilot.microsoft.com/

AI assistant integrated across Microsoft 365 apps and Windows, helping users create content, summarize information, and automate tasks within the Microsoft ecosystem.

Business model: mixed (some features included with Microsoft 365 subscriptions, premium features with additional Copilot Pro subscription)

Recommended uses:

  • Content creation and editing in Office apps
  • Email drafting and summarization in Outlook
  • Meeting summaries and action items in Teams
  • Data analysis assistance in Excel
  • Presentation creation in PowerPoint

Strengths:

  • Integration with Microsoft’s ecosystem
  • Context-aware assistance across multiple applications
  • Enterprise-grade security and compliance
  • Reduces time spent on routine tasks

Weaknesses:

  • Requires Microsoft 365 subscription
  • Additional cost for Copilot Pro features
  • Performance varies across different applications
  • May have limited utility outside the Microsoft ecosystem
  • Learning curve to use effectively across all applications

https://mistral.ai/

An open-source focused AI company developing powerful language models with efficient architectures that deliver strong performance even in smaller parameter sizes.

Business model: mixed (open-source versions and paid API access)

Recommended uses:

  • Enterprise AI integration
  • Self-hosted applications
  • Research and development
  • Text generation and understanding

Strengths:

  • Strong performance-to-size ratio
  • Open approach with accessible models
  • Efficient architecture requiring less computing power
  • European focus with emphasis on data sovereignty

Weaknesses:

  • Newer company with evolving product lineup
  • Less established ecosystem of tools
  • May not match larger models in certain specialized tasks

https://www.perplexity.ai/

An AI-powered answer engine that combines search capabilities with language models to provide referenced, up-to-date answers to questions.

Business model: freemium (basic features free, Pro subscription for advanced features)

Recommended uses:

  • Real-time information gathering
  • Research on current topics
  • Quick fact-checking
  • Learning about complex subjects

Strengths:

  • Provides sources for information (from the web only)
  • Combines search and AI capabilities
  • More up-to-date than standard LLMs
  • Conversational follow-up capability

Weaknesses:

  • May still include incorrect information
  • Limited depth compared to specialized research tools
  • Citations sometimes don’t fully support claims
  • Best for factual queries rather than creative tasks

https://publicai.co/

A Switzerland-based multilingual, general-purpose open-source AI conversational platform which uses the Apertus model, developed by EPFL, ETH Zurich, and the Swiss National Supercomputing Centre (CSCS).

Business model: free and open-source (free to use, adapt, and deploy)

Recommended uses:

  • Natural-language conversations
  • Answering general knowledge questions

Strengths:

  • Superior multilingual ability, especially in the Swiss context (includes Romansh and Swiss German in addition to the standard language options)
  • Data stored on server based in Switzerland
  • Public, free and open source

Weaknesses:

  • Very new – knowledge cutoff at March 2024
  • Incorrect responses are relatively common for now
  • Limited memory

Research Assistants

Specialized AI tools designed to assist with academic and scientific research, helping to find, analyze, and synthesize information from scholarly sources. These tools streamline literature reviews and accelerate the research process.

https://app.answerthis.io/

An AI-powered research assistant that supports many parts of the literature research and writing workflow

Business model: freemium (free plan with limited credits, premium plan for full access).

Training corpus: different sources (PubMed, Semantic Scholar, various preprint servers, open access journals, patents, etc.)

Recommended Uses:

  • Help with literature reviews
  • Identifying less obvious literature gaps
  • Drafting an introduction section for a paper

Strengths:

  • Large corpus (over 200 million papers)
  • Toggles for searching through papers and/or the internet
  • Intuitive prompt helper that covers several aspects of literature review and writing
  • Shows exact sources it used in a generated draft, and logic behind generated draft

Weaknesses:

  • Must have an account even to try it out
  • Selection process of sources mentioned in answer unclear
  • Many options restricted to paid tier

https://asta.allen.ai/

An AI research assistant that helps find research papers and summarize literature (and will apparently also analyze data in the future).

Business model: free

Training corpus: Semantic Scholar

Recommended Uses:

  • Getting a basic understanding of a research question/field
  • Drafting an introduction section for a paper
  • Identifying key literature in a research area

Strengths:

  • Transparent breakdown of steps it took to arrive at its output, including how it interpreted the user’s prompt
  • Downloadable literature review report with references in the Summarize literature option
  • In the Find papers option, each reference is marked by the amount of relevance for the topic/question in the prompt

Weaknesses:

  • Unclear selection of sources shortlisted for answer in the Summarize literature option
  •  “LLM memory” sometimes used instead of actual references in the Summarize literature option
  • May miss very new literature that is not indexed, and underperform in niche fields

https://consensus.app/

An AI-powered search engine specifically designed for scientific research that finds and summarizes insights from academic papers.

Business model: freemium (Basic search free, premium features subscription-based)

Corpus: Semantic Scholar

Recommended uses:

  • Literature reviews
  • Staying updated on research developments
  • Finding scientific consensus on specific questions

Strengths:

  • Focused on peer-reviewed research
  • Provides concise summaries of findings
  • Citations for all information
  • Reduces information overload

Weaknesses:

  • Limited to academic and scientific content
  • May miss very recent publications
  • Advanced features require subscription
  • Limited coverage in some niche fields

https://elicit.com/

A research assistant that uses AI to help researchers find relevant papers, understand research, and summarize findings.

Business model: freemium (basic features free, team and advanced features paid)

Corpus: Semantic Scholar

Recommended uses:

  • Literature reviews and summaries
  • Finding relevant studies on specific topics
  • Extracting key information from papers

Strengths:

  • Focuses on extracting relevant information
  • Helps formulate research questions
  • Provides literature maps and connections

Weaknesses:

  • May struggle with very technical or niche topics
  • Best for specific research questions rather than broad exploration
  • Some advanced features restricted to paid plans

 

https://www.rayyan.ai/

A machine learning and AI-powered tool that helps conduct systematic reviews by accelerating literature choice from a user-provided initial longlist of references and streamlining the subsequent review process.

Business model: freemium (free tier for basic features; paid Professional and Student plans offer additional functions and higher review limits).

Training corpus: the set of references that you give it

Recommended Uses:

  • Systematic reviews and meta-analyses

Strengths:

  • Efficient article classification and relevance ranking
  • Integration with major reference managers (Mendeley, EndNote, BibTeX)
  • Advanced filtering, tagging, and annotation tools (PICO, PRISMA flow, custom keywords)

Weaknesses:

  • Use case limited to reviews, meta-analyses, and evidence synthesis
  • Free version limits advanced features and number of active concurrent reviews
  • Steep learning curve for new users
  • The free version only allows one research project

https://researchrabbitapp.com/home

A literature mapping tool (initially traditional machine learning-based, now also with AI), designed to help researchers discover and stay up to date on scientific literature.

Business model: freemium (free plan with basic functionality, advanced options only in premium plan).

Training corpus: PubMed, Semantic Scholar

Recommended Uses:

  • Discovering a new research field in a visual way
  • Identifying the key papers and authors in a research area
  • Staying up to date on research in your field

Strengths:

  • Intuitive visual representation of research landscape
  • Optional email updates for staying informed on emerging literature
  • Zotero integration

Weaknesses:

  • Not as many features as other AI tools
  • Use effectively restricted to literature discovery and review
  • The free version only allows one research project

https://scite.ai/assistant

An AI tool that provides citation context and analysis, showing how scientific papers have been cited by other researchers.

Business model: freemium (basic features free, premium features subscription-based)

Training corpus: different sources (publishers, Unpaywall, PubMed, fatcat, various preprint servers, university repositories, open access journals, and more)

Recommended Uses:

  • Evaluating the impact and reception of research papers
  • Understanding citation context (supporting or contrasting)
  • Identifying key papers in a research area

Strengths:

  • Shows citation context, not just count
  • Classifies citations as supporting, contrasting, or mentioning
  • Chrome extension for seamless integration
  • Helps evaluate research validity and impact

Weaknesses:

  • Coverage varies by field
  • Learning curve to understand citation classifications
  • Some advanced analytics restricted to paid tier

Retrieval-augmented Generation (RAG)

AI systems that combine the generative capabilities of LLMs with the ability to retrieve and reference specific information from external sources or databases. This approach enhances accuracy by grounding AI responses in verified information.

https://chatgpt.com/

Customizable versions of ChatGPT that can be tailored with specific instructions, knowledge, and capabilities to serve particular use cases.

Business model: requires ChatGPT Plus subscription to create

Recommended uses:

  • Domain-specific assistance
  • Company knowledge base interactions
  • Specialized workflows
  • Task-specific automation

Strengths:

  • Can be customized without coding
  • Ability to upload reference documents
  • Can be tailored for specific use cases
  • Web browsing and tool use capabilities

Weaknesses:

  • Creating custom GPTs requires paid subscription
  • Limited memory between sessions
  • Privacy concerns with uploaded data
  • May still hallucinate despite custom knowledge

https://notebooklm.google.com/

A tool that combines the interactive computing environment of notebooks with large language models, allowing for context-aware AI assistance in data analysis and research workflows.

Business model: freemium (free access with a Google account, paid Plus version available)

Recommended uses:

  • Data science projects
  • Interactive data analysis
  • Educational content creation
  • Research documentation

Strengths:

  • Contextual awareness of your data and code
  • Combines computational and language capabilities
  • Can work with existing data analysis workflows

Weaknesses:

  • Can be resource-intensive
  • May have limited domain-specific knowledge
  • Privacy concerns with data handling (Google)
  • May prioritize Google services in recommendations

Take part in our IA training course

Reshaping Information Research with AI

In the last few years, many AI-powered tools have promised to fundamentally change the way we look for information. During this workshop, we will explore some of them, focusing on so-called research assistants in order to understand how they work and how we can use them to make our research better.

Use of AI tools in the creation of this page

We used ChatGPT 4.0 to generate all the images and to fine-tune the text.

Contact

[email protected]


+41 21 693 21 56


Access map