Similarity check


EPFL relies on iThenticate for similarity detection in documents. This online software is provided by the American company Turnitin.

According to its website, iThenticate checks uploaded document against a database containing “over 60 billion web pages and 155 million content items, including 49 million works from 800 scholarly publisher participants” (, consulted on November 11, 2016).

Complete information is available at, where you can search for a particular journal at the bottom of the page to know if it’s included in iThenticate’s database.

Often called “plagiarism detection tool”, it’s important to understand that this tool is not able to determine if plagiarism occured! iThenticate only provides a similarity score and a report. A human then needs to read through the report to make the decision.

If you include (part of) your own articles in your thesis then please make sure that you followed the rules mentioned in ( and ask – where appropriate – permission from the publisher of your articles.


To test a document, you need an account on EPFL PhD students and thesis directors will receive an email, before the end of 2016, with an account created automatically.

When logged in, you can upload a document.

You can choose to generate a report and/or to upload the document to the repository. You also need to give some metadata (title, author).

 Be aware that the file you upload has to comply with the following requirements (based on, consulted on November 14, 2016).

* file size: maximum 40MB (and maximum 2MB of raw text)

* zip file: maximum 200MB or 1,000 files

* document length: minimum 20 words, maximum 400 pages

* supported documents types: Word, Text, PostScript, PDF, HTML, Word Perfect WPD, OpenOffice ODT, RTF, Hangul HWP

The content is then extracted from the document and processed. When it’s finished, a score is displayed. This score is just a calculated percentage of the document that has similarities with content indexed in iThenticate’s database. The score itselt doesn’t mean anything. It has to be analyzed by a human in order to be meaningful.

At the end, you make the decision: is there a problem with the document or not?

iThenticate only provides some additional hints and points parts of the document that may be interesting to closely look at. Let’s see how to read iThenticate’s report.

Divided into two parts, the report displays the document with the matches in the left and an index on the right.


The index indicates all the sources where iThenticate found matches. Every index entry is a link to the related part of the document and every colored part of the document links back to the index.

Not all matches are meaningful. The software may have misretrieved sentence that were actually cited in the document or very common sentences.

To reduce this, you can customize the setup of iThenticate (My Documents > Settings tab). You can exclude sections from the test.


iThenticate guide

Useful information on how to use iThenticate.

Turnitin similarity guide

Useful information on how to use Turnitin similarity.

Access map