Examples and real cases of data processing

Disclaimer

The following examples have been developed by the Università della Svizzera italiana in collaboration with the University of Neuchâtel. Note that all the text here , except third party contents (e.g. quotes), is published under the Creative Commons Attribution Share Alike 4.0 International License. To view a copy of this license, visit this page.

When doing statistical surveys, in order for the researcher to keep only aggregated data, a threshold should be determined under which it is deemed possible to identify the subjects.

If an element appears to possibly single out a subject in a given category of people, even if large (e.g. only one lawyer volunteers in a town of 5’000 inhabitants), either this criterion must be removed, or other criteria are added to dilute the results.

In order to analyse tourism trends, which is the topic of the research, a researcher collects a number of data from open social media profiles concerning cities and places visited by the users in a certain moment of the year.

After collecting the data, the researcher selects the relevant ones and anonymizes the dataset by removing all information that permits the identification of the person it refers to, and by aggregating all the remaining information. Therefore, the researcher only keeps for example “City 1: number of tourists from country X, number of tourists from country Z, number of tourists from country Y” etc.

There is no name and if there is only a small number coming from a certain country, this information is removed.

This example is about researchers working with UGC (User Generated Content) on social media like for example Facebook comments under an article published by a newspaper.

The researchers need the comments’ content for discourse analysis (in order to understand the “mood” of the population about a topic). The researchers are interested only in content, and the identity is not relevant, how should they proceed with the anonymization? Paraphrasing the content, not telling under which article was the comment found, not sharing the primary sources in the publication…? It is even possible to avoid personal data just by keeping the comments related to the article without any information that refers to the writer of the comment. For example by writing “40% of the comments is in favour of the opinion expressed in the article”, there is no personal data.

If the researcher makes screenshots of the comments, there are two possibilities: either the screenshots are kept but names, profile photos and any other personal information is well blurred, or the researcher transcribes only the non-personal information needed for their research and then deletes the screenshots containing personal data. But if these comments are publicly accessible (for example TripAdvisor) the data protection may be considered in a more shallow way, depending on the specific circumstances.

A picture of a group of people wearing masks, common sunglasses, hats and clothes is probably anonymous. If among these people there is one or few wearing a particular object, a religious symbol, a specific tattoo, or any other particular element that may somehow single out the subject, the image contains personal data. To anonymize it, it is necessary to remove all the personal elements by cutting off these parts or, if not possible, to blur all of them. The same applies to videos.

An interview can be anonymized by removing all information that identifies the interviewee. If not possible, consent is required.

Most Swiss federal Courts judgements are accessible online. As stated in its regulation (in french), only anonymized judgement should be published. Yet, that’s not 100% guaranteed and the researcher still should pay attention when using such judgements for their research: during the collection of the data and the analysis, which happens only within the research team (or by the researcher on their own), there should be no problems.

But if the researcher needs to re-publish the judgements for example in a publication, the researcher must take more precautions and try to anonymize if the subjects are still identifiable (by removing the information that identifies the case). The Swiss data protection law is more permissive in case anonymization is reasonably impossible.

Two Cases of Identifiable Persons in Switzerland

1 – Google Street View Case (2012)

In 2012 the Swiss Federal Supreme Court issued a sentence establishing that images of sensitive facilities, particularly women’s refuge homes, nursing homes, prisons, schools, courts and hospitals in Street View modality on Google Maps may be personal data, because a person may be recognizable through the combination of several pieces of information and these specific places. Moreover, the Federal Court stipulated that a person can be identifiable also by other identifying characteristics such as skin color, clothing, aids of physically disabled persons etc. To know more about this sentence, read more here.

2 – Blick Interview Case (2019)

A person is considered to be identifiable when a newspaper article mentions details of a person’s activity and place of residence that allows their family members and friends to recognize them, even if their name is mentioned only as a pseudonym and their face is partially covered in the picture.