Student projects

This page lists the projects currently open at the Laboratory for the History of Science and Technology (LHST). If you are interested in working on one of the projects listed below, please write directly to the contact person, with Prof. Baudry cc’ed in your email.

The project descriptions are only brief outlines and we are in general flexible about the particulars.

This project, in collaboration with the Institute of Psychology (IP) from the University of Lausanne (UNIL) is the final part of the SNSF research project MICE conducted by Prof. Rémy Amouroux. The aim of MICE is to produce the first transnational historical study of the reception and indigenization of behavior therapy in the Francophone context between the early 1960s and the 1990s.

This student project addresses the circulation and reception of psychotherapy in French-speaking Europe through distant reading of two newsmagazines: Psychology Today and its French variation Psychologie. The goal is to trace the role played by the specialized press in disseminating behavior therapy to a large audience.

The corpus includes two magazines in French and English, available in bulk downloads with different OCR quality. These are: Psychology Today (1967-1971) and Psychologie (1970-1980). The goal of the project will be to observe whether there is a progressive rising of new psychotherapeutic tools and whether this phenomenon is concomitant/related to the rise of criticism towards psychoanalysis. The project will attempt to establish whether it is possible to observe coherent clusters of: critique of psychoanalysis; psychotherapy and new psychotherapeutic tools (ie. cognitive behavioral therapy).

This project will be done in collaboration with Prof. Jérôme Baudry (EPFL), Prof. Rémy Amouroux and Dr. Elsa Forner (UNIL).

Project type: semester project or master’s thesis.

Prerequisites: Prior experience in text mining; solid data analysis skills; knowledge in NLP and computational linguistics; interest in history and social sciences a plus; language skills in both English and French.

Contact: [email protected].

The project investigates how digital tools can be used to study the dynamics of innovation in science and technology, from the eighteenth century to today. Innovation—the production of the new—is often said to be radical and path-breaking; yet, what can history teach us about the actual rhythms of innovation? Looking at past centuries of technological development, do we see continuity or discontinuity? Can we identify waves of innovation (and imitation)? How new is the new and how did people strategically describe and draw technology to present it as new? How did inventors in the past address the dangers and potential negative consequences of their activity? Analyzing and/or building a corpus of patents of invention, you will choose and apply state-of-the-art NLP/machine learning methods and/or use your statistical and data science knowledge.

Below are some more detailed examples of themes for semester projects and master’s theses, but you can also propose your own questions. You will work with an interdisciplinary team of historians and computer scientists.

Tracing international patent flows: Since the 19th century, as economic relationships became more and more globalized, individuals and corporations have increasingly patented their inventions in many countries simultaneously. Available statistics do not allow to answer questions such as: Which patents had counterparts in other countries? In which countries were patents covering the same technology to be found? Did the inventors usually patent in their country of residence first, or did they choose other countries? A computational analysis of digitized patent documents might help in answering such questions. While the textual descriptions of the inventions needed to be translated in the language of each country, and adapted to its legal system, the drawings contained in the document tended to be reused. Using computer vision techniques to match patents from different countries that feature the same drawings would shed light on the historical dynamics of international patenting and technology flows.

– Classifying patents and technology: Categorizing innovation is difficult. Categories are static, innovation is dynamic. Innovation can and does happen in-between established categories. Yet, it is very important to categorize patents, because the dynamic of innovation and the logic of taking out patents differ according to technology and to industry. However, such labels are usually missing from the datasets. The availability of the textual description of patents presents a great opportunity to address the challenge of classification.

– Extending the geographical scope of possible investigations: Most studies relying on the full text of historical patents rely on those issued by the United States of America, because of their easy availability in digitized form. To investigate similar questions for other countries, the available scanned material would need to be prepared and processed to be turned into clean digital full text, and results from off-the-shelf optical character recognition (OCR) software are of varying and sometimes questionable quality. This would be an interesting challenge for people interested in computer vision and OCR, and in bringing about a less one-sided view of innovation.

Project type: semester project or master’s thesis.

Prerequisites: prior experience in either text mining, NLP or computer vision; solid skills in data analysis and Python; prior experience with large datasets and/or working remotely on a server is a plus.

Contact : Jérôme Baudry ([email protected])

This project will investigate how machine vision algorithms could be applied by historians to automate the analysis of historical images. Are the available datasets and algorithms suitable for the research goals and methods of historians? What are the kind of historical questions best suited to machine vision algorithms? You will be responsible for constructing a dataset of historical scientific images adequate for machine learning by automating a process which extracts images and their textual description from pdf scans of 19th century science textbooks. You will then use the dataset to test algorithms and extract visual patterns and features. 

Project type: semester project or master’s thesis.

Prerequisites: prior experience in machine learning and pattern recognition.

Contact: [email protected].

This project will investigate the role of scientific images by analyzing the textual description of the images employed in nineteenth century science textbooks. Were such images supposed to take the place of the actual objects or experiments by enabling a form of “virtual witnessing”, or were they supposed to be used along with physical objects and guide hands-on experiments? Can one identify changes in the way images were used (for example, to think with) between the beginning and the end of the nineteenth century? Is there a difference between the way readers were supposed to use (or think with) images printed on plates attached at the end of the book and , and images printed in line with the text? The project will explore various methods for tackling these questions relying on text mining and NLP techniques. 

Project type: semester project or master’s thesis.

Prerequisites: prior experience with NLP and text mining techniques.

Contact: Ion Mihailescu ([email protected])

Open Science is an international movement aiming at making all scientific research productions—publications, data, software, methods—freely accessible to all people in society: researchers, amateurs, policy makers, industries, as well as artists, journalists, and activists. Open Science across the world relies heavily on the design and development of dedicated infrastructures, mostly platforms: digital libraries, data repositories (“as open as possible, as closed as necessary”), directories, online journals, web archives, computational services, MOOCs, content management systems (CMS), collaborative version control, etc.

These platforms are loosely bound as a network. For example, a publication on an online journal may refer, via a persistent identifier (such as a DOI), to a dataset hosted on a given repository. Another example is when an open science search engine may harvest the metadata of libraries and directories to index available publications.

The shape of the open science ecosystem online and the nature of the links that tie platforms together are not well known yet. The aim of this project is to crawl the web to identify the links between platforms, to characterize their nature, and to generate an interactive map of the open science network.

Level
Master (research project, optional research project, or master’s project).

Assistant responsible for the project
Simon Dumas Primbault ([email protected])

Possibility to work in group?
Yes.

Over the past decades, the development of online scientific platforms radically changed the way researchers access, browse, or read scientific articles. Yet, the computational study of digital research practices is only in its early days. This project aims at documenting the behavior of scientists on online platforms by making sense of the digital traces they generate while navigating.

Gallica is the digital library of the French national library (BnF). It hosts more than 10 million documents of a great diversity of types—printed books, periodicals, images, music, maps…—and in up to 8 different languages.

You will perform data mining on the browsing logs of Gallica over the year 2022 (with ~60.000 users per day) and make sense of user experience by identifying patterns of research behaviours through cluster analysis. Based on previous research (see LHST GitHub) and taking advantage of the format of the log files, you will be required to:

  1. Sessionize the logs by designing and testing models for individual “navigation paths”
  2. Enrich the users’ sessions with the metadata provided by the Archival Resource Keys of the documents (type of document, discipline, year, etc.) – additionally, other methods (e.g., topic modelling) could be applied to enrich the server logs
  3. Identify the sessions’ relevant features for clustering
  4. Deploy a mixed-method clustering algorithm (graph, word2vec, signatures…)
  5. Build a typology of users’ behaviour on the platform by clustering the previously identified users’ paths – a diachronic study could be led by applying the pipeline to a rolling 30-day periods.

Level
Master (research project, optional research project, or master’s project).

Assistant responsible for the project
Simon Dumas Primbault ([email protected])

Possibility to work in group?
Yes.