This page lists the projects currently open at the Laboratory for the History of Science and Technology (LHST). If you are interested in working on one of the projects listed below, please write directly to the contact person, with Prof. Baudry cc’ed in your email.
The project descriptions are only brief outlines and we are in general flexible about the particulars.
This project, in collaboration with the Institute of Psychology (IP) from the University of Lausanne (UNIL) is the final part of the SNSF research project MICE conducted by Prof. Rémy Amouroux. The aim of MICE is to produce the first transnational historical study of the reception and indigenization of behavior therapy in the Francophone context between the early 1960s and the 1990s.
This student project addresses the circulation and reception of psychotherapy in French-speaking Europe through distant reading of two newsmagazines: Psychology Today and its French variation Psychologie. The goal is to trace the role played by the specialized press in disseminating behavior therapy to a large audience.
The corpus includes two magazines in French and English, available in bulk downloads with different OCR quality. These are: Psychology Today (1967-1971) and Psychologie (1970-1980). The goal of the project will be to observe whether there is a progressive rising of new psychotherapeutic tools and whether this phenomenon is concomitant/related to the rise of criticism towards psychoanalysis. The project will attempt to establish whether it is possible to observe coherent clusters of: critique of psychoanalysis; psychotherapy and new psychotherapeutic tools (ie. cognitive behavioral therapy).
This project will be done in collaboration with Prof. Jérôme Baudry (EPFL), Prof. Rémy Amouroux and Dr. Elsa Forner (UNIL).
Project type: semester project or master’s thesis.
Prerequisites: Prior experience in text mining; solid data analysis skills; knowledge in NLP and computational linguistics; interest in history and social sciences a plus; language skills in both English and French.
Contact: [email protected].
The project investigates how digital tools can be used to study the dynamics of innovation in science and technology, from the eighteenth century to today. Innovation—the production of the new—is often said to be radical and path-breaking; yet, what can history teach us about the actual rhythms of innovation? Looking at past centuries of technological development, do we see continuity or discontinuity? Can we identify waves of innovation (and imitation)? How new is the new and how did people strategically describe and draw technology to present it as new? How did inventors in the past address the dangers and potential negative consequences of their activity? Analyzing and/or building a corpus of patents of invention, you will choose and apply state-of-the-art NLP/machine learning methods and/or use your statistical and data science knowledge.
Below are some more detailed examples of themes for semester projects and master’s theses, but you can also propose your own questions. You will work with an interdisciplinary team of historians and computer scientists.
– Classifying patents and technology: Categorizing innovation is difficult. Categories are static, innovation is dynamic. Innovation can and does happen in-between established categories. Yet, it is very important to categorize patents, because the dynamic of innovation and the logic of taking out patents differ according to technology and to industry. However, such labels are usually missing from the datasets. The availability of the textual description of patents presents a great opportunity to address the challenge of classification.
– Technological risk & safety: Sometimes, bridges collapse, furnaces and steam engines explode, electrical sparks cause fires or kill people. Did historical patents discuss the invention being potentially dangerous? What did they say about this issue? How did that evolve over time? Do these concerns cluster in certain technologies / industries?
– Irresponsible innovation: While many of the negative consequences of technology are classic examples of unintended consequences (say, climate change), this is not always the case. For instance, when it was decided in the early 1920s to add tetraethyl lead to gasoline to increase octane, the toxic effects of lead were well known. Do historical patents show traces of other examples of innovation that could demonstrably have been concerning at the time? How common or uncommon are those patents?
– Extending the geographical scope of possible investigations: Most studies relying on the full text of historical patents rely on those issued by the United States of America, because of their easy availability in digitized form. To investigate similar questions for other countries, the available scanned material would need to be prepared and processed to be turned into clean digital full text, and results from off-the-shelf OCR software are of varying and sometimes questionable quality. This would be an interesting challenge for people interested in computer vision and OCR, and in bringing about a less one-sided view of innovation.
Project type: semester project or master’s thesis.
Prerequisites: prior experience in text mining; solid data analysis skills; knowledge in NLP and Machine Learning is a plus.
Contact : Jérôme Baudry ([email protected])
Over the past decades, the development of online scientific platforms radically changed the way researchers access, browse, or read scientific articles. Yet, the computational study of digital research practices is only in its early days. This project aims at documenting the behavior of scientists on online platforms by making sense of the digital traces they generate while navigating.
You will perform data mining on the browsing logs of Gallica over one year (May 2016 to June 2017, with ~40.000 users per day) and make sense of user experience by identifying patterns through cluster analysis. Based on previous research (see LHST GitHub) and taking advantage of the format of the log files, you will be required to:
- Design and test models of “navigation paths” to extract relevant sessions
- Refine an ontology based on the metadata provided by the Archival Resource Keys of the documents (type of document, discipline, year, etc.)
– Additionally, other methods (e.g., topic modelling) could be applied to enrich the server logs - Use the EPFL giotto-tda Python library of topological data analysis to identify the geometrical shapes of users’ paths through the ontology
- Build a typology of users’ behaviour on the platform by clustering the previously identified users’ paths
– A diachronic study could be led by applying the pipeline to a rolling 30-day periods
Project type: semester project or master’s thesis.
Prerequisites: solid skills in Python: data cleansing and enriching; prior experience in unsupervised machine learning (notably non-Euclidean vector space models, PCA, and pattern recognition through cluster analysis); possible knowledge of topological data analysis (TDA).
Languages: English, French, Italian
Sections: IN DH
Contact: Simon Dumas Primbault ([email protected])
General Semantic Knowledge Bases (Wikidata, DBpedia) convert human knowledge (e.g. Wikipedia articles) into structured content which is machine readable and queryable. The goal of this project is to create a specialized Semantic Knowledge Base which is suited for the research methods of historians. Can one use the available semantic web technologies to encode not only simple statements of fact, but also complex claims arising from historical interpretations? For this project, you will design an ontology appropriate for encoding a heterogeneous set of information about a historical artefact (the Hipp Chronoscope). The data will be modelled as semantic triples (RDF) and queried through SPARQL. The project will give you the opportunity to learn about semantic web technologies even if you are not familiar with them.
Project type: semester project or master’s thesis.
Prerequisites: knowledge about database concepts and SQL.
Contact : Ion-Gabriel Mihailescu ([email protected])
This project will investigate how digital tools can be used to recreate the historical setting in which scientific objects and instruments were used. Can we build narratives and displays of science that go beyond selecting and contemplating individual objects in isolation? Can we visualize the role played by the setting in shaping the outcomes of an experiment? The project will map out the space of the experiment (e.g. the workbench, the lab-room etc.), the spatial relations between the objects and people involved in the experiment, and the temporal sequence of events. You will be responsible for designing the visualization of a past scientific experiment such that the role of the historical setting is made conspicuous. You are free to choose the graphical environment best suited for the project goals.
Project type: semester project or master’s thesis.
Prerequisites: prior experience in graphic design or 3D modelling.
Contact : Ion-Gabriel Mihailescu ([email protected])
This project will investigate how machine vision algorithms could be applied by historians to automate the analysis of historical images. Are the available datasets and algorithms suitable for the research goals and methods of historians? What are the kind of historical questions best suited to machine vision algorithms? You will be responsible for constructing a dataset of historical scientific images adequate for machine learning by automating a process which extracts images and their textual description from pdf scans of 19th century science textbooks. You will then use the dataset to test algorithms and extract visual patterns and features.
Project type: semester project or master’s thesis.
Prerequisites: prior experience in machine learning and pattern recognition.
Contact : Ion-Gabriel Mihailescu ([email protected])
This project will investigate how the main newspapers of different Eurocentric societies articulate public opinion about technology during the Second Industrial Revolution. The corpus will include a variety of multilingual newspapers available in bulk downloads with different OCR quality. These are: Spain: El imparcial (1867-1933), France: Le Figaro (1830-1926), Germany: Hamburg, Neue Hamburger Zeitung (1896-1922); Berlin: Berliner Tagenblatt (1878-1928), Austria: Wiener Zeitung (1876-1945), United States: The New York Herald (1840-1920), UK: The Times (1840-1920), Italy: La Stampa (1880-1920). The methodology will include word embeddings/word collocations, topic modelling, sentiment analysis. The goal of the project will be to observe whether there is a progressive homogenization of views regarding technology as the Second Industrial Revolution advances, or if it is possible to observe consistent geographic clusters of public opinion.
This project will be done in collaboration with Prof. Jerome Baudry and UZH Post-doctoral researcher Elena Fernández (https://elenafernandezfern.wixsite.com/elena-fernandez)
Project type: semester project or master’s thesis.
Prerequisites: prior experience in text mining; solid data analysis skills; knowledge in NLP and computational linguistics a plus. Multilingual candidates will be especially welcomed, but it is not a requisite.
Contact: Elena Fernández ([email protected])
This project will analyse the increasing social anxiety about the future that took place during the second half of the nineteenth century coinciding with the Second Industrial Revolution. The aggressive wave of technological innovations arriving to society created an unprecedented change in the daily life of Western citizens, who in a period of forty years witnessed the massive construction of railroads worldwide, the widespread use of electric light, the telegraph, or the steam ship, along with many other innovations. As a result, an increasing uncertainty about the future took place due to the fast pace lifestyle changes that had never been experienced before, with well-known historic figures such as Goethe, Karl Marx, or Charles Chaplin expressing worries about a hard to imagine twentieth century futurity. This project will quantify this social anxiety about the future by using Name Entity Recognition. The corpus will include a variety of multilingual newspapers available in bulk downloads with different OCR quality. These are: Spain: El imparcial (1867-1933), France: Le Figaro (1830-1926), Germany: Hamburg, Neue Hamburger Zeitung (1896-1922); Berlin: Berliner Tagenblatt (1878-1928), Austria: Wiener Zeitung (1876-1945), United States: The New York Herald (1840-1920), UK: The Times (1840-1920), Italy: La Stampa (1880-1920). The goal of the project will be to observe whether there is a progressive homogenization of future anxieties as the Second Industrial Revolution advances, or if it is possible to observe consistent geographic clusters of fear.
This project will be done in collaboration with Prof. Jerome Baudry and UZH Post-doctoral researcher Elena Fernández (https://elenafernandezfern.wixsite.com/elena-fernandez).
Project type: semester project or master’s thesis.
Prerequisites: prior experience in text mining; solid data analysis skills; knowledge in NLP and computational linguistics a plus. Multilingual candidates will be especially welcomed, but it is not a requisite.
Contact: Elena Fernández ([email protected])