The Digital Humanities Laboratory (DHLAB) was the first Digital Humanities Laboratory created in Switzerland and, benefiting from Frederic Kaplan’s previous research in artificial intelligence, robotics and human-machine interfaces, it was arguably the first lab in the world conducting Digital Humanities research involving intensive computational methods, machine learning and robotics .
During the period 2012-2019, the DHLAB addressed challenges linked with Big Cultural Datasets by launching a series of interdisciplinary projects in collaboration with other institutions. These served as demonstrator and proof of feasibility for designing larger scale projects planned during the period 2020-2028.
From Venice Time Machine to Time Machine Europe
A large part of the lab’s effort was dedicated to the launch of the Venice Time Machine, an international scientific program conducted in collaboration with the University Ca’Foscari and the Venice State Archives. The project aimed at building a multidimensional model of Venice and its evolution covering a period of more than 1000 years., and ambitions to reconstruct a large open access database that could be used for research and education. The DHLAB created the prototypes of a processing pipeline enabling the digitization, transcription and indexation of archival series, setting the foundation of the largest database ever created on Venetian documents. In complement to these primary sources, the content of thousands of monographs has been indexed and made searchable. The information extracted from these sources is organized in a semantic graph of linked data and unfolded in space and time in an historical geographical information system. This project is now continuing as part of larger European initiative: Time Machine Europe.
From Swisspress to Impresso
The second largest project conducted during the period 2012-2019 by the DHLAB was Swisspress, a project developed in partnership with the newspaper Le Temps and the Swiss National Library. For this project, the DHLAB developed an innovative interface allowing to navigate through an archive composed of the digitised version of the “Le Journal de Genève”, “La Gazette de Lausanne” and “Nouveau Quotidien”: 4 Million articles covering 200 years of newspapers. Using natural language processing techniques and deep learning approaches, the lab extracted information about 50 Million named entities (person, place, institutions). This database is now the subject of several courses at EPFL. The project was steered by a scientific board composed of historians and journalists. This project is now continuing as part of the Impresso Project (2017-2020), based on the analysis of more than 300 newspapers in French and German.
A change of scale
These two large-scale projects were at the origin of several “applied” contributions (Technology for massive digitalization, development of algorithms to process Big Cultural Datasets, invention of new exploration interfaces) but also structured more fundamental reflexions concerning the impact of this change of scale for the humanities. One central question was whether Data science methodologies open a new “epistemological regime” for many disciplines in the Humanities, characterized by the use of large-scale open research platforms and the identification of transdisciplinary research concepts. During the past six years, the DHLAB team managed to identify a first series of concepts and methods applying to domains as diverse as linguistic evolution, art history, urban history, historiography, narrative composition. The identification of these pivotal concepts, at the intersection of different disciplines, have potentially an impact not only on the way research should be conducted in the different subdomain of the Digital Humanities but also on the way Digital Humanities should be taught. It is precisely this transdisciplinary approach that Frederic Kaplan has tried to implement in his more recent teaching activities and that will be tested at larger scale in the near future.
The seven steps
DHLAB research is organised as pipeline of seven steps. The challenges and achievements of each of these steps are detailed in the corresponding pages
- Step 1: Massive digitization devices produce large collections of images.
- Step 2: The reoccurring morphological patterns in these images are discovered and indexed. When possible, texts are transcribed.
- Step 3: Structured Information is extracted based on the analysis of regulated representations like administrative forms, cadastre or maps.
- Step 4: Extracted texts and symbols are interpreted as part of continuously evolving linguistic systems
- Step 5: Extracted information networks are represented in formalisms enabling the existence of multiple realities, yet negotiation processes push towards convergence towards shared reconstructions of the past
- Step 6: Information is realigned in space and time using 4D formalisms
- Step 7: New interfaces are developed to experience the resulting reconstructions.