Please note that the publication lists from Infoscience integrated into the EPFL website, lab or people pages are frozen following the launch of the new version of platform. The owners of these pages are invited to recreate their publication list from Infoscience. For any assistance, please consult the Infoscience help or contact support.
2024
Mapping Bibliotheca Hertziana
The project introduces an innovative visual method for analysing libraries and archives, with a focus on Bibliotheca Hertziana’s library collection. This collection, which dates back over a century, is examined by integrating user loan data with deep mapping techniques to reveal usage patterns and thematic clusters. To achieve this, dimensionality reduction is employed to visualise the catalogue, map- ping books based on their loans, and prompt engineering with large language models helps to identify loan clusters with detailed descriptions and titles. This approach not only paves the way for cultural analytics but also provides the basis for dynamic classification and developing a recommendation system. This project offers alternative insights into the art historical research conducted at Bibliotheca Hertziana, capturing the collection’s evolution and usage. The method established here provides a flexible framework for visually mapping cultural and academic collections in the digital humanities.
2024-03-20
Advisor(s): F. Kaplan; D. Rodighiero; A. Adamou
A fragment-based approach for computing the long-term visual evolution of historical maps
Cartography, as a strategic technology, is a historical marker. Maps are tightly connected to the cultural construction of the environment. The increasing availability of digital collections of historical map images provides an unprecedented opportunity to study large map corpora. Corpus linguistics has led to significant advances in understanding how languages change. Research on large map corpora could in turn significantly contribute to understanding cultural and historical changes. We develop a methodology for cartographic stylometry, with an approach inspired by structuralist linguistics, considering maps as visual language systems. As a case study, we focus on a corpus of 10,000 French and Swiss maps, published between 1600 and 1950. Our method is based on the fragmentation of the map image into elementary map units. A fully interpretable feature representation of these units is computed by contrasting maps from different, coherent cartographic series, based on a set of candidate visual features (texture, morphology, graphical load). The resulting representation effectively distinguishes between map series, enabling the elementary units to be grouped into types, whose distribution can be examined over 350 years. The results show that the analyzed maps underwent a steady abstraction process during the 17th and 18th centuries. The 19th century brought a lasting scission between small- and large-scale maps. Macroscopic trends are also highlighted, such as a surge in the production of fine lines, and an increase in map load, that reveal cultural fashion processes and shifts in mapping practices. This initial research demonstrates how cartographic stylometry can be used for exploratory research on visual languages and cultural evolution in large map corpora, opening an effective dialogue with the history of cartography. It also deepens the understanding of cartography by revealing macroscopic phenomena over the long term.
Humanities & Social Sciences Communications
2024-03-04
Vol. 11 , num. 1, p. 363.DOI : 10.1057/s41599-024-02840-w
Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study
The quality of automatic transcription of heritage documents, whether from printed, manuscripts or audio sources, has a decisive impact on the ability to search and process historical texts. Although significant progress has been made in text recognition (OCR, HTR, ASR), textual materials derived from library and archive collections remain largely erroneous and noisy. Effective post-transcription correction methods are therefore necessary and have been intensively researched for many years. As large language models (LLMs) have recently shown exceptional performances in a variety of text-related tasks, we investigate their ability to amend poor historical transcriptions. We evaluate fourteen foundation language models against various post-correction benchmarks comprising different languages, time periods and document types, as well as different transcription quality and origins. We compare the performance of different model sizes and different prompts of increasing complexity in zero and few-shot settings. Our evaluation shows that LLMs are anything but efficient at this task. Quantitative and qualitative analyses of results allow us to share valuable insights for future work on post-correcting historical texts with LLMs.
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)
2024-02-18
The 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, St Julian’s, Malta, March 22, 2024.p. 133-159
Digital Guardianship: Innovative Strategies in Preserving Armenian’s Epigraphic Legacy
In the face of geopolitical threats in Artsakh, the preservation of Armenia’s epigraphic heritage has become a mission of both historical and cultural urgency. This project delves deep into Armenian inscriptions, employing advanced digital tools and strategies like the Oxygen text editor and EpiDoc guidelines to efficiently catalogue, analyze, and present these historical treasures. Amidst the adversities posed by Azerbaijan’s stance towards Armenian heritage in Artsakh, the digital documentation and preservation of these inscriptions have become a beacon of cultural resilience. The XML-based database ensures consistent data, promoting scholarly research and broadening accessibility. Integrating the Grabar Armenian dictionary addressed linguistic challenges, enhancing data accuracy. This initiative goes beyond merely preserving stone and text; it is a testament to the stories, hopes, and enduring spirit of the Armenian people in the face of external threats. Through a harmonious blend of technology and traditional knowledge, the project stands as a vanguard in the fight to ensure that Armenia’s rich epigraphic legacy, and the narratives they enshrine remain undiminished for future generations.
Heritage
2024
Vol. 7 , num. 5.DOI : 10.3390/heritage7050109
Towards Chapterisation of Podcasts Detection of Host and Structuring Questions in Radio Transcripts
This Master thesis investigates the application of Bidirectional Encoder Representations from Transformers (BERT) on podcast to identify the host and detect structuring questions within each episode. This research is conducted on an annotated dataset of automatic transcriptions of 38 French podcasts of Radio France and 37 TV shows in English of France 24. A variety of BERT models, with different language orientations, are tested and compared on two classifying tasks: the detection of host sentences and the classification of structuring questions. The latter is firstly performed as a three label classification task. Secondly, a reduction to a binary classifier is proposed, with two new configurations. Initially, BERT models are fine-tuned separately on French and English datasets, as well as on the joint dataset. In a second time, a multilingual approach is implemented with an automatic translation of the original dataset into a total of twenty languages. The translated datasets are used for multilingual fine-tuning and German is included as an evaluation language. BERT models demonstrate adequate performance in host detection to pinpoint within the list of the speakers the actual host of the show, as well as a proposed comparison rule-based method. For structuring question detection, the three label classifier appears too subtle, at least regarding the size of fine-tuning data. One binary classification configuration yields promising results. The multilingual experiment shows that automatic translation has potential as a source of fine-tuning data and highlight the need for original testing data in these languages.
2024-03-28
Advisor(s): M. Ehrmann
2023
Where Did the News Come From? Detection of News Agency Releases in Historical Newspapers
Since their beginnings in the 1830s and 1840s, news agencies have played an important role in the national and international news market, aiming to deliver news as fast and as reliable as possible. While we know that newspapers have been using agency content for a long time to produce their stories, the amount to which the agencies shape our news often remains unclear. Although researchers have already addressed this question, recently by using computational methods to assess the influence of news agencies at present, large-scale studies on the role of news agencies in the past continue to be rare. This thesis aims to bridge this gap by detecting news agencies in a large corpus of Swiss and Luxembourgish newspaper articles (the impresso corpus) for the years 1840-2000 using deep learning methods. For this, we first build and annotate a multilingual dataset with news agency mentions, which we then use to train and evaluate several BERT-based agency detection and classification models. Based on these experiments, we choose two models (for French and German) for the inference on the impresso corpus. Results show that ca. 10% of the articles explicitly reference news agencies, with the greatest share of agency content after 1940, although systematic citation of agencies already started slowly in the 1910s. Differences in the usage of agency content across time, countries and languages as well as between newspapers reveal a complex network of news flows, whose exploration provides many opportunities for future work.
2023-08-18
Advisor(s): M. Ehrmann; E. Boros; M. Duering; F. Kaplan
Digitization of the Inscriptions on the Monuments of Armenian Cultural Heritage in Nagorno-Karabakh Region
This article discusses the efforts of the DH LAB (EPFL) to preserve Armenian cultural heritage in Artsakh, particularly inscriptions on monuments in the Nagorno-Karabakh region. Using Digital Humanities (DH) technology, the project aims to collect, systematize, and digitize these inscriptions, creating 3D models where possible. The initiative addresses the lack of digital epigraphy applied to Armenian inscriptions, previously underrepresented in digital heritage efforts. The project has established a network of partners in Armenia and Artsakh, and has begun creating a searchable inscriptions database, integrating essential information and translations. Challenges include editing texts, transcription accuracy, and developing a critical apparatus. The project also focuses on creating Linked Open Data and expanding to include Armenian epigraphic heritage in Ukraine, aiming to build a comprehensive corpus of Armenian inscriptions globally.
2023-07-01
Digital Humanities 2023. Collaboration as Opportunity. (DH2023), Graz, Austria, 10-14 July 2023.DOI : 10.5281/zenodo.8108026
The Skin of Venice: Automatic Facade Extraction from Point Clouds
We propose a method to extract orthogonal views of facades from photogrammetric models of cities. This method was applied to extract all facades of the city of Venice. The result images open up new areas of research in architectural history.
Digital Humanities 2023 : Book of Abstracts
2023-06-30
ADHO Digital Humanities Conference 2023 (DH2023), Graz, Austria, July 10-14 2023.DOI : 10.5281/zenodo.8107943
Effective annotation for the automatic vectorization of cadastral maps
The great potential brought by large-scale data in the humanities is still hindered by the time and technicality required for making documents digitally intelligible. Within urban studies, historical cadasters have been hitherto largely under-explored despite their informative value. Powerful and generic technologies, based on neural networks, to automate the vectorization of historical maps have recently become available. However, the transfer of these technologies is hampered by the scarcity of interdisciplinary exchanges and a lack of practical literature destinated to humanities scholars, especially on the key step of the pipeline: the annotation. In this article, we propose a set of practical recommendations based on empirical findings on document annotation and automatic vectorization, focusing on the example case of historical cadasters. Our recommendations are generic and easily applicable, based on a solid experience on concrete and diverse projects.
Digital Scholarship In The Humanities
2023-03-09
DOI : 10.1093/llc/fqad006
From Archival Sources to Structured Historical Information: Annotating and Exploring the “Accordi dei Garzoni”
If automatic document processing techniques have achieved a certain maturity for present time documents, the transformation of hand-written documents into well-represented, structured and connected data which can satisfactorily be used for historical study purposes is not straightforward and still presents major challenges. Transitioning from documents to structured data was one of the key challenges faced by the Garzoni project and this chapter details the techniques and the steps taken to represent, extract, enhance and exploit the information contained in the archival material.
Apprenticeship, Work, Society in Early Modern Venice; Abingdon: Routledge, Taylor & Francis Group, 2023-02-10. p. 304.ISBN : 978-1-003197-19-5
DOI : 10.4324/9781003197195-6
Transhistorical Urban Landscape as Hypermap
This article explores the conception, design, and implementation of a hypertextual map that we call hypermap. Using Giovanni Nolli’s 1748 map of the city of Rome as a backbone, we conducted an experiment based on one of the routes defined by Giuseppe Vasi’s Grand Tour of Rome to collect various types of urban and environmental information, thus aiming to connect a multiplicity of data from different nature and times periods to enhance the serendipitous elaboration of new narratives, interpretations, and data (namely “unfolding”) not implicitly enacted by the pure analytical and mechanistic overlapping of gathered data (“folding”). This experiment is part of the research project entitled Datathink that was conducted at the Bibliotheca Hertziana – Max Planck Institute for Art History in Rome, the experiment serves as a proof of concept for an augmented database of the urban landscape in the city of Rome and new ways to facilitate the access and enhancement of cultural artifacts and knowledge.
34Th Acm Conference On Hypertext And Social Media, Ht 2023
2023-01-01
34th ACM Conference on Hypertext and Social Media (HT), Rome, ITALY, SEP 04-08, 2023.DOI : 10.1145/3603163.3609083
Yes but.. Can ChatGPT Identify Entities in Historical Documents?
Large language models (LLMs) have been leveraged for several years now, obtaining state-of-the-art performance in recognizing entities from modern documents. For the last few months, the conversational agent ChatGPT has “prompted” a lot of interest in the scientific community and public due to its capacity of generating plausible-sounding answers. In this paper, we explore this ability by probing it in the named entity recognition and classification (NERC) task in primary sources (e.g., historical newspapers and classical commentaries) in a zero-shot manner and by comparing it with state-of-the-art LM-based systems. Our findings indicate several shortcomings in identifying entities in historical text that range from the consistency of entity annotation guidelines, entity complexity, and code-switching, to the specificity of prompting. Moreover, as expected, the inaccessibility of historical archives to the public (and thus on the Internet) also impacts its performance.
2023 Acm/Ieee Joint Conference On Digital Libraries, Jcdl
2023-01-01
23rd ACM/IEEE Joint Conference on Digital Libraries (JCDL), Santa Fe, NM, JUN 26-30, 2023.p. 184-189
DOI : 10.1109/JCDL57899.2023.00034
impresso Text Reuse at Scale. An interface for the exploration of text reuse data in semantically enriched historical newspapers
Text Reuse reveals meaningful reiterations of text in large corpora. Humanities researchers use text reuse to study, e.g., the posterior reception of influential texts or to reveal evolving publication practices of historical media. This research is often supported by interactive visualizations which highlight relations and differences between text segments. In this paper, we build on earlier work in this domain. We present impresso Text Reuse at Scale, the to our knowledge first interface which integrates text reuse data with other forms of semantic enrichment to enable a versatile and scalable exploration of intertextual relations in historical newspaper corpora. The Text Reuse at Scale interface was developed as part of the impresso project and combines powerful search and filter operations with close and distant reading perspectives. We integrate text reuse data with enrichments derived from topic modeling, named entity recognition and classification, language and document type detection as well as a rich set of newspaper metadata. We report on historical research objectives and common user tasks for the analysis of historical text reuse data and present the prototype interface together with the results of a user evaluation.
Frontiers in Big Data
2023-11-03
Vol. 6 , num. Visualizing Big Culture and History Data , p. 1-16.DOI : 10.3389/fdata.2023.1249469
From Automated Bootstrapping to Collaborative Editing: A Framework for 4D City Reconstruction
We propose a framework for the construction of 4D scientific models for cities of the past. It leverages both the use of computational methods for bootstrapping, and a collaborative interface for visualisation and edition. This permits to continuously enrich the model and dynamically update it through procedural modelling.
Digital Humanities 2023: Book of Abstracts
2023
Digital Humanities 2023: Collaboration as Opportunity (DH2023), Graz, Austria, July 10-14, 2023.p. 310-313
DOI : 10.5281/zenodo.8107906
Machine-Learning-Enhanced Procedural Modeling for 4D Historical Cities Reconstruction
The generation of 3D models depicting cities in the past holds great potential for documentation and educational purposes. However, it is often hindered by incomplete historical data and the specialized expertise required. To address these challenges, we propose a framework for historical city reconstruction. By integrating procedural modeling techniques and machine learning models within a Geographic Information System (GIS) framework, our pipeline allows for effective management of spatial data and the generation of detailed 3D models. We developed an open-source Python module that fills gaps in 2D GIS datasets and directly generates 3D models up to LOD 2.1 from GIS files. The use of the CityJSON format ensures interoperability and accommodates the specific needs of historical models. A practical case study using footprints of the Old City of Jerusalem between 1840 and 1940 demonstrates the creation, completion, and 3D representation of the dataset, highlighting the versatility and effectiveness of our approach. This research contributes to the accessibility and accuracy of historical city models, providing tools for the generation of informative 3D models. By incorporating machine learning models and maintaining the dynamic nature of the models, we ensure the possibility of supporting ongoing updates and refinement based on newly acquired data. Our procedural modeling methodology offers a streamlined and open-source solution for historical city reconstruction, eliminating the need for additional software and increasing the usability and practicality of the process.
Remote Sensing
2023
Vol. 15 , num. 13, p. 3352.DOI : 10.3390/rs15133352
Ce que les machines ont vu et que nous ne savons pas encore
Cet article conceptualise l’idée qu’il existe une « matière noire » composée des structurations latentes identifiées par le regard machinique sur de grandes collections photographiques patrimoniales. Les campagnes photographiques de l’histoire de l’art, au xxe siècle, avaient pour ambition implicite de transformer toutes les œuvres d’art en documents plus facilement étudiables. Au fil du temps, la création de ces collections visuelles a permis de produire un corpus d’informations potentiellement plus dense et plus riche que ce que ses créateurs avaient initialement imaginé. En effet, la conversion numérique de ces immenses corpus visuels permet aujourd’hui de réanalyser ces images avec des techniques de vision par ordinateur, l’intelligence artificielle ouvrant ainsi la voie à des perspectives d’études bien différentes de celles envisageables au siècle dernier. Nous pourrions ainsi dire qu’il y a dans ces images un immense potentiel latent de connaissance, un réseau dense de relations qui n’a pas encore été mis en lumière. Qu’est-ce que les machines ont vu ou vont pouvoir voir dans ces collections d’images que l’homme n’a pas encore identifié ? Quelle étendue la connaissance visuelle de l’homme couvre-t-elle par rapport à ce que la machine a pu analyser ? Les nouvelles techniques d’indexation des images et des motifs qui les constituent nous rapprochent d’une révolution copernicienne du visuel dans laquelle l’homme peut, grâce à la machine-prothèse, analyser beaucoup plus d’images qu’il ne pouvait le faire par une simple activité mnémonique et sélectionner des perspectives spécifiques en comparant des ensembles de motifs les uns par rapport aux autres. Cette vision augmentée est fondée sur une pré-analyse conduite par la machine sur l’ensemble de ces corpus visuels, un entraînement qui permet de retrouver la structure sous-jacente du système d’images. La vision humaine est ainsi étendue par le regard artificiel préalable de la machine. Pour comprendre les enjeux de cette nouvelle alliance, il faut étudier la nature de ce regard artificiel, comprendre son potentiel pour découvrir des structures jusqu’à présent inconnues et anticiper les nouvelles formes de connaissances humaines auxquelles il pourra donner naissance. L’enjeu sera donc, pour les prochaines années, de comprendre ce que les machines ont vu et que nous ne savons pas encore.
Sociétés & Représentations
2023
num. 1, p. 249-267.DOI : 10.3917/sr.055.0249
Computational Approaches to Digitised Historical Newspapers (Dagstuhl Seminar 22292)
Historical newspapers are mirrors of past societies, keeping track of the small and great history and reflecting the political, moral, and economic environments in which they were produced. Highly valued as primary sources by historians and humanities scholars, newspaper archives have been massively digitised in libraries, resulting in large collections of machine-readable documents and, over the past half-decade, in numerous academic research initiatives on their automatic processing. The Dagstuhl Seminar 22292 “Computational Approaches to Digitised Historical Newspaper” gathered researchers and practitioners with backgrounds in natural language processing, computer vision, digital history and digital library involved in computational approaches to historical newspapers with the objectives to share experiences, analyse successes and shortcomings, deepen our understanding of the interplay between computational aspects and digital scholarship, and discuss future challenges. This report documents the program and the outcomes of the seminar. DagRep, Volume 12, Issue 7, pages 112-179
2023
p. 69.2022
Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches
Page layout analysis is a fundamental step in document processing which enables to segment a page into regions of interest. With highly complex layouts and mixed scripts, scholarly commentaries are text-heavy documents which remain challenging for state-of-the-art models. Their layout considerably varies across editions and their most important regions are mainly defined by semantic rather than graphical characteristics such as position or appearance. This setting calls for a comparison between textual, visual and hybrid approaches. We therefore assess the performances of two transformers (LayoutLMv3 and RoBERTa) and an objection-detection network (YOLOv5). If results show a clear advantage in favor of the latter, we also list several caveats to this finding. In addition to our experiments, we release a dataset of ca. 300 annotated pages sampled from 19th century commentaries.
Proceedings of the Computational Humanities Research Conference 2022 Antwerp, Belgium, December 12-14, 2022
2022-12-12
Third Conference on Computational Humanities Research (CHR 2022), Antwerp, Belgium, December 12-14, 2022.p. 36-54
DOI : 10.48550/arXiv.2212.13924
A data structure for scientific models of historical cities: extending the CityJSON format
In the field of the 3D reconstruction of cities in the past there is a raising interest in the creation of models that are not just geometrical, but also informative, semantic and georeferenced. Despite the advancements that were done in the historical reconstruction of architecture and archaeology, the solutions designed for larger scale models are still very limited. On the other hand, research on the digitisation of current-day cities provides useful instruments. In particular, CityJSON – a JSON encoding of CityGML – represents an easy-to-use and lightweight solution for storing 3D models of cities that are geolocated, semantic and that contain additional information in the form of attributes. This contribution proposes (1) to extend the schema to the needs of a historical representation; and (2) to incorporate the newly created model in a continuous flow pipeline, in which the geometry is dynamically updated each time an attribute is changed, as a means to foster collaboration.
Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities
2022-11-11
6th ACM SIGSPATIAL International Workshop on Geospatial Humanities, Seattle, Washington, November 1, 2022.p. 20-23
DOI : 10.1145/3557919.3565813
Searching for visual patterns in a children’s drawings collection
The success of large-scale digitization projects at museums, archives, and libraries is pushing other cultural institutions to embrace digitization to preserve their collections. By juxtaposing digital tools with digitized collections, it is now possible to study these cultural objects at a previously unknown scale. This thesis is the first attempt to explore a recently digitized children’s drawings collection while developing a system to identify patterns in them linked with popular cultural objects. Artists, as young as three and as old as 25, created nearly 90,000 drawings in the span of three decades from most countries in the world. The preliminary examination unveils that these drawings mirror a solid cultural ethos by using specific iconographic subjects, objects, and colors, and the distinction between children of different parts of the globe is visible in their works. These factors not only make the dataset distinct from other sketch datasets but place it distantly from them in terms of size and multifariousness of creations and the creators. The essential and another dimension of the project is matching the drawings and the popular cultural objects they represent. A deep learning model that learns a metric to rank the visual similarity between the images is used to identify the drawing-artwork pairs. Though the networks developed for image classification perform inadequately for the matching task, networks used for pattern matching in paintings show good performance. Fine-tuning the models increases the performance drastically. The primary outcomes of this work are (1) systems trained with a few methodically chosen examples perform comparably to the systems trained on thousands of generic samples and (2) using drawings enriched by adding generic effects of watercolor, oil painting, pencil sketch, and texturizing mitigates the situation of network learning examples by heart.
2022-07-08
Advisor(s): F. Kaplan; J. R. Fageot
Automatic table detection and classification in large-scale newspaper archives
In recent decades, major efforts to digitize historical documents led to the creation of large machine readable corpora, including newspapers, which are waiting to be processed and analyzed. Newspapers are a valuable historical source, notably because of the plurality of subjects and points of view they cover; however their heterogeneity due to their diachronic properties and their visual richness makes them difficult to deal with. Certain recurring elements, such as tables, which are powerful layout objects because of their ability to easily convey a large amount of information through their logical visual arrangement, play a role in the difficulty of processing them. This thesis focuses on automatic table processing in large-scale newspaper archives. Starting from a large corpus of Luxembourgish newspapers annotated with tables, we propose a statistical exploration of this dataset as well as strategies to address its annotation inconsistencies and to automatically bootstrap a training dataset for table classification. We also explore the ability of deep learning methods to detect and semantically classify tables. The performance of image segmentation models are compared in a series of experiments around their ability to learn under challenging conditions, while classifiers based on different combinations of data modalities are evaluated on the task of table classification. Results show that visual models are able to detect tables by learning on an inconsistent ground truth, and that adding further modalities increases classification performance.
2022-02-08
Advisor(s): M. Ehrmann; S. Clematide; F. Kaplan
Boosting named entity recognition in domain-specific and low-resource settings
Recent researches in natural language processing have leveraged attention-based models to produce state-of-the-art results in a wide variety of tasks. Using transfer learning, generic models like BERT can be fine-tuned for domain-specific tasks using little annotated data. In the field of digital humanities and classics, bibliographical reference extraction counts among the domain-specific tasks where few annotated datasets have been made available. It therefore remains a highly challenging Named Entity Recognition (NER) problem which has not been addressed by the aforementioned approaches yet. In this study, we try to boost bibliographical reference extraction with various transfer learning strategies. We compare three transformers to a Conditional Random Fields (CRF) developed by Romanello, using both generic and domain-specific pre-training. Experiments show that transformers consistently improve on CRF baselines. However, domain-specific pre-training yields no significant benefits. We discuss and compare these results in light of comparable researches in domain-specific NER.
2022-01-13
p. 21.Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents
This paper presents an overview of the second edition of HIPE (Identifying Historical People, Places and other Entities), a shared task on named entity recognition and linking in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, HIPE-2022 confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. This shared task is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop appropriate technologies to efficiently retrieve and explore information from historical texts. On such material, however, named entity processing techniques face the challenges of domain heterogeneity, input noisiness, dynamics of language, and lack of resources. In this context, the main objective of HIPE-2022, run as an evaluation lab of the CLEF 2022 conference, is to gain new insights into the transferability of named entity processing approaches across languages, time periods, document types, and annotation tag sets. Tasks, corpora, and results of participating teams are presented.
Experimental IR Meets Multilinguality, Multimodality, and Interaction. 13th International Conference of the CLEF Association, CLEF 2022, Bologna, Italy, September 5–8, 2022, Proceedings
2022
13th Conference and Labs of the Evaluation Forum (CLEF 2022), Bologna, Italy, 5-8 September 2022.p. 423-446
DOI : 10.1007/978-3-031-13643-6_26
Digitised Historical Newspapers: A Changing Research Landscape (Introduction)
The application of digital technologies to newspaper archives is transforming the way historians engage with these sources. The digital evolution not only affects how scholars access historical newspapers, but also, increasingly, how they search, explore and study them. Two developments have been driving this transformation: massive digitisation, which facilitates access to remote holdings and, more recently, improved search capabilities, which alleviate the tedious exploration of vast collections, opens up new prospects and transforms research practices. The volume “Digitised newspapers – A New Eldorado for Historians?” brings together the contributions of a workshop held in 2020 on tools, methods and epistemological reflections on the use of digitised newspapers and offers three perspectives: how digitisation is transforming access to and exploration of historical newspaper collections; how automatic content processing allows for the creation of new layers of information; and, finally, what analyses this enhanced material opens up. This introductory chapter reviews recent developments that have influenced the research landscape of digitized newspapers in recent years and introduces the eighteen articles that comprise this volume.
Digitised Newspapers – A New Eldorado for Historians?; Berlin, Boston: De Gruyter Oldenbourg, 2022-12-31. p. 439.ISBN : 978-3-110729-21-4
Digitised Newspapers – A New Eldorado for Historians? Reflections on Tools, Methods and Epistemology
The application of digital technologies to historical newspapers has changed the research landscape historians were used to. An Eldorado? Despite undeniable advantages, the new digital affordance of historical newspapers also transforms research practices and confronts historians with new challenges. Drawing on a growing community of practices, the impresso project invited scholars experienced with digitised newspaper collections with the aim of encouraging a discussion on heuristics, source criticism and interpretation of digitized newspapers. This volume provides a snapshot of current research on the subject and offers three perspectives: how digitisation is transforming access to and exploration of historical newspaper collections; how automatic content processing allows for the creation of new layers of information; and, finally, what analyses this enhanced material opens up. ‘impresso – Media Monitoring of the Past’ is an interdisciplinary research project that applies text mining tools to digitised historical newspapers and integrates the resulting data into historical research workflows by means of a newly developed user interface. The question of how best to adapt text mining tools and their use by humanities researchers is at the heart of the impresso enterprise.
Berlin: De Gruyter, 2022-12-31.ISBN : 978-3-110729-21-4
DOI : 10.1515/9783110729214
Conditional Synthetic Financial Time Series with Generative Adversarial Networks
The creation of high fidelity synthetic data has long been an important goal in machine learning, particularly in fields like finance where the lack of available training and test data make it impossible to utilize many of the deep learning techniques which have proven so powerful in other domains. Despite ample research into different types of synthetic generation techniques, which in recent years have largely focused on generative adversarial networks, there remain key holes in many of the architectures and techniques being utilized. In particular, there are currently no techniques available which can generate multiple series concurrently while capturing the specific stylized facts of financial time series and which incorporate extra information that effect the series such as macroeconomic factors. In this thesis, we propose the Conditional Market Transformer-Encoder Generative Adversarial Network (C-MTE-GAN), a novel generative adversarial neural network architecture that satisfies the aforementioned challenges. C-MTE-GAN is able to capture the relevant univariate stylized facts such as lack of autocorrelation of returns, volatility clustering, fat tails, and the leverage effect. It is also able to capture the multivariate interactions between multiple concurrently generated series such as correlation and tail dependence. Lastly, we are able to condition the generated series both on a prior series of returns as well as on different types of relevant information that typically effect both the characteristics of the market and factor into asset allocation decision making. Furthermore, we demonstrate the effectiveness of data generated by C-MTE-GAN to augment training of a statistical arbitrage model and improve its performance in realistic portfolio allocation scenarios. The abilities of this architecture represent a substantial step forward in financial time series generation which will hopefully unlock many new applications of synthetic data within the realm of finance.
2022
Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents
This paper presents an overview of the second edition of HIPE (Identifying Historical People, Places and other Entities), a shared task on named entity recognition and linking in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, HIPE-2022 confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. This shared task is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop appropriate technologies to efficiently retrieve and explore information from historical texts. On such material, however, named entity processing techniques face the challenges of domain heterogeneity, input noisiness, dynamics of language, and lack of resources. In this context, the main objective of HIPE-2022, run as an evaluation lab of the CLEF 2022 conference, is to gain new insights into the transferability of named entity processing approaches across languages, time periods, document types, and annotation tag sets. Tasks, corpora, and results of participating teams are presented. Compared to the condensed overview, this paper contains more refined statistics on the datasets, a break down of the results per type of entity, and a discussion of the ‘challenges’ proposed in the shared task.
Proceedings of the Working Notes of CLEF 2022 – Conference and Labs of the Evaluation Forum
2022
13th Conference and Labs of the Evaluation Forum (CLEF 2022), Bologna, Italy, 5-8 Sept 2022.DOI : 10.5281/zenodo.6979577
Opacité et transparence dans le design d’un dispositif de surveillance urbain : le cas de l’IMSI catcher
This thesis assesses the surveillance operated on the mobile phone network by governmental actors (intelligence agencies, police, army) and the relationship between monitored spaces and their users. Indeed, some new surveillance devices used by intelligence services redefine surveillance spatiality raising new questions in this field of research. More specifically this research focuses on one specific object: the IMSI catcher, a monitoring apparatus of the cellular network that intercepts cellphones’ identity and some communications in a given area by mimicking the activity of a cell tower. While this kind of device seems to offer a tactical and a security interest in the fight against terrorism and against crime, many civil liberties organisations such as the Electronic Frontier Foundation, Privacy International and _La Commission nationale de l’informatique et des libertés are concerned about the potential of an uncontrolled surveillance ; indeed, the controversial nature of the device could endanger certain individual and public rights. What is this technical object and which new issues comes with its use in surveillance? How and from which perspective is it problematic What does the IMSI catcher teaches us on the potential future of surveillance regimes? I look into this specific device case in a research framework at the intersection of design research practices, science and technology studies (STS) and surveillance studies. First, I deal with this surveillance apparatus as a technical object, from a perspective fed by the theoretical framework of _concretization_ and _technical lines_ proposed by Gilbert Simondon and Yves Deforge, through the analysis of a visual and technical documentation. Second, I use a research by design approach to explore certain assumptions regarding the nature of the object itself, its functioning and its “concrete” aspect â or rather “non-concrete” in the present case â with the help of approaches borrowed to reverse engineering and reconstitution, close to media archeology. Then, I explore possible opposition and protest trajectories with the help of prototypes designed with critical design and speculative design methods. Finally, through the writing of prospective scenarios, I build a design fiction that offers a synthesis, potentially subject to debate, around the IMSI catcher’s uses, present and to come, and more broadly on the potential future of surveillance regimes.
Lausanne: EPFL2022
p. 230.DOI : 10.5075/epfl-thesis-8838
ECCE: Entity-centric Corpus Exploration Using Contextual Implicit Networks
In the Digital Age, the analysis and exploration of unstructured document collections is of central importance to members of investigative professions, whether they might be scholars, journalists, paralegals, or analysts. In many of their domains, entities play a key role in the discovery of implicit relations between the contents of documents and thus serve as natural entry points to a detailed manual analysis, such as the prototypical 5Ws in journalism or stock symbols in finance. To assist in these analyses, entity-centric networks have been proposed as a language model that represents document collections as a cooccurrence graph of entities and terms, and thereby enables the visual exploration of corpora. Here, we present ECCE, a web-based application that implements entitycentric networks, augments them with contextual language models, and provides users with the ability to upload, manage, and explore document collections. Our application is available as a web-based service at http://dimtools.uni.kn/ecce.
WWW ’22 Companion
2022
The Web Conference (WWW’22), Lyon, France, April 25-29, 2022.p. 1-4
DOI : 10.1145/3487553.3524237
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
We present the HIPE-2022 shared task on named entity processing in multilingual historical documents. Following the success of the first CLEF-HIPE-2020 evaluation lab, this edition confronts systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation tag sets. HIPE-2022 is part of the ongoing efforts of the natural language processing and digital humanities communities to adapt and develop appropriate technologies to efficiently retrieve and explore information from historical texts. On such material, however, named entity processing techniques face the challenges of domain heterogeneity, input noisiness, dynamics of language, and lack of resources. In this context, the main objective of the evaluation lab is to gain new insights into the transferability of named entity processing approaches across languages, time periods, document types, and annotation tag sets.
Advances in Information Retrieval
2022-04-05
44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10-14, 2022.p. 347-354
DOI : 10.1007/978-3-030-99739-7_44
HIPE-2022 Shared Task Named Entity Datasets
HIPE-2022 datasets used for the HIPE 2022 shared task on named entity recognition and classification (NERC) and entity linking (EL) in multilingual historical documents. HIPE-2022 datasets are based on six primary datasets assembled and prepared for the shared task. Primary datasets are composed of historical newspapers and classic commentaries covering ca. 200 years, feature several languages and different entity tag sets and annotation schemes. They originate from several European cultural heritage projects, from HIPE organizers’ previous research project, and from the previous HIPE-2020 campaign. Some are already published, others are released for the first time for HIPE-2022. The HIPE-2022 shared task assembles and prepares these primary datasets in HIPE-2022 release(s), which correspond to a single package composed of neatly structured and homogeneously formatted files.
2022
2021
Automatic Content Curation of Visual Heritage
Digitization and preservation of large heritage induce high maintenance costs to keep up with the technical standards and ensure sustainable access. Creating impactful usage is instrumental to justify the resources for long-term preservation. The Museum für Gestaltung of Zurich holds one of the biggest poster collections of the world from which 52’000 were digitised. In the process of building a digital installation to valorize the collection, one objective was to develop an algorithm capable of predicting the next poster to show according to the ones already displayed. The work presented here describes the steps to build an algorithm able to automatically create sequences of posters reflecting associations performed by curator and professional designers. The exposed challenge finds similarities with the domain of song playlist algorithms. Recently, artificial intelligence techniques and more specifically, deep-learning algorithms have been used to facilitate their generations. Promising results were found thanks to Recurrent Neural Networks (RNN) trained on manually generated playlist and paired with clusters of extracted features from songs. We used the same principles to create the proposed algorithm but applied to a challenging medium, posters. First, a convolutional autoencoder was trained to extract features of the posters. The 52’000 digital posters were used as a training set. Poster features were then clustered. Next, an RNN learned to predict the next cluster according to the previous ones. RNN training set was composed of poster sequences extracted from a collection of books from the Gestaltung Museum of Zurich dedicated to displaying posters. Finally, within the predicted cluster, the poster with the best proximity compared to the previous poster is selected. The mean square distance between features of posters was used to compute the proximity. To validate the predictive model, we compared sequences of 15 posters produced by our model to randomly and manually generated sequences. Manual sequences were created by a professional graphic designer. We asked 21 participants working as professional graphic designers to sort the sequences from the one with the strongest graphic line to the one with the weakest and to motivate their answer with a short description. The sequences produced by the designer were ranked first 60%, second 25% and third 15% of the time. The sequences produced by our predictive model were ranked first 25%, second 45% and third 30% of the time. The sequences produced randomly were ranked first 15%, second 29%, and third 55% of the time. Compared to designer sequences, and as reported by participants, model and random sequences lacked thematic continuity. According to the results, the proposed model is able to generate better poster sequencing compared to random sampling. Eventually, our algorithm is sometimes able to outperform a professional designer. As a next step, the proposed algorithm should include a possibility to create sequences according to a selected theme. To conclude, this work shows the potentiality of artificial intelligence techniques to learn from existing content and provide a tool to curate large sets of data, with a permanent renewal of the presented content.
World Academy of Science, Engineering and Technology International Journal of Humanities and Social Sciences
2021-11-18
ICDH 2021 : International Conference on Digital Heritage, London, United Kingdom, November 18-19, 2021.Generic Semantic Segmentation of Historical Maps
Research in automatic map processing is largely focused on homogeneous corpora or even individual maps, leading to inflexible models. Based on two new corpora, the first one centered on maps of Paris and the second one gathering maps of cities from all over the world, we present a method for computing the figurative diversity of cartographic collections. In a second step, we discuss the actual opportunities for CNN-based semantic segmentation of historical city maps. Through several experiments, we analyze the impact of figurative and cultural diversity on the segmentation performance. Finally, we highlight the potential for large-scale and generic algorithms. Training data and code of the described algorithms are made open-source and published with this article.
CEUR Workshop Proceedings
2021-11-17
CHR 2021: Computational Humanities Research Conference, Amsterdam, The Netherlands, November 17-19, 2021.p. 228-248
Optical Character Recognition of 19th Century Classical Commentaries: the Current State of Affairs
Together with critical editions and translations, commentaries are one of the main genres of publication in literary and textual scholarship, and have a century-long tradition. Yet, the exploitation of thousands of digitized historical commentaries was hitherto hindered by the poor quality of Optical Character Recognition (OCR), especially on commentaries to Greek texts. In this paper, we evaluate the performances of two pipelines suitable for the OCR of historical classical commentaries. Our results show that Kraken + Ciaconna reaches a substantially lower character error rate (CER) than Tesseract/OCR-D on commentary sections with high density of polytonic Greek text (average CER 7% vs. 13%), while Tesseract/OCR-D is slightly more accurate than Kraken + Ciaconna on text sections written predominantly in Latin script (average CER 8.2% vs. 8.4%). As part of this paper, we also release GT4HistComment, a small dataset with OCR ground truth for 19th classical commentaries and Pogretra, a large collection of training data and pre-trained models for a wide variety of ancient Greek typefaces.
HIP ’21: The 6th International Workshop on Historical Document Imaging and Processing
2021-10-31
HIP ’21: The 6th International Workshop on Historical Document Imaging and Processing, Lausanne, Switzerland, September 5-6, 2021.p. 1-6
DOI : 10.1145/3476887.3476911
Optical Character Recognition of 19th Century Classical Commentaries: the Current State of Affairs
Together with critical editions and translations, commentaries are one of the main genres of publication in literary and textual scholarship, and have a century-long tradition. Yet, the exploitation of thousands of digitized historical commentaries was hitherto hindered by the poor quality of Optical Character Recognition (OCR), especially on commentaries to Greek texts. In this paper, we evaluate the performances of two pipelines suitable for the OCR of historical classical commentaries. Our results show that Kraken + Ciaconna reaches a substantially lower character error rate (CER) than Tesseract/OCR-D on commentary sections with high density of polytonic Greek text (average CER 7% vs. 13%), while Tesseract/OCR-D is slightly more accurate than Kraken + Ciaconna on text sections written predominantly in Latin script (average CER 8.2% vs. 8.4%). As part of this paper, we also release GT4HistComment, a small dataset with OCR ground truth for 19th classical commentaries and Pogretra, a large collection of training data and pre-trained models for a wide variety of ancient Greek typefaces.
2021-10-13
DOI : 10.48550/arXiv.2110.06817
Aux portes du monde miroir
The Mirror World is no longer an imaginary device, a mirage in a distant future, it is a reality under construction. In Europe, Asia and on the American continent, large companies and the best universities are working to build the infrastructures, to define their functionalities, to specify their logistics. The Mirror World, in its asymptotic form, presents a quasi-continuous representation of the world in motion, integrating, virtually, all photographic perspectives. It is a new giant computational object, opening the way to new research methods or even probably to a new type of science. The economic and cultural stakes of this third platform are immense. If the Mirror World transforms access to knowledge for new generations, as the Web and Social Networks did in their time, it is our responsibility to understand and, if need be, bend its technological trajectory to make this new platform an environment for the critical knowledge of the past and the creative imagination of the future.
Revue Histoire de l’art : Humanités numériques
2021-06-29
Vol. 87 .Une approche computationnelle du cadastre napoléonien de Venise
At the beginning of the 19th century, the Napoleonic administration introduced a new standardised description system to give an objective account of the form and functions of the city of Venice. The cadastre, deployed on a European scale, was offering for the first time an articulated and precise view of the structure of the city and its activities, through a methodical approach and standardised categories. With the use of digital techniques, based in particular on deep learning, it is now possible to extract from these documents an accurate and dense representation of the city and its inhabitants. By systematically checking the consistency of the extracted information, these techniques also evaluate the precision and systematicity of the surveyors’ work and therefore indirectly qualify the trust to be placed in the extracted information. This article reviews the history of this computational protosystem and describes how digital techniques offer not only systematic documentation, but also extraction perspectives for latent information, as yet uncharted, but implicitly present in this information system of the past.
Humanités numériques
2021-05-01
Vol. 3/2021 , num. 3.DOI : 10.4000/revuehn.1786
Les vingt premières années du capitalisme linguistique : Enjeux globaux de la médiation algorithmique des langues
La médiation des flux linguistiques par une poignée d’acteurs mondiaux a permis la constitution de modèles de la langue dont les performances sont aujourd’hui sans précédents. Ce texte revient sur les thèses principales qui expliquent des dynamiques économiques et technologique à l’origine de ce phénomène, prolongeant les réflexions préalablement entamées (thèses 1 et 2), pour ensuite développer la question du développement et de l’usage des nouveaux modèles basé sur le capital linguistique accumulé (thèses 3 et 4) et les conséquences sur les nouvelles formes de médiation algorithmiques qui en découlent (thèse 5 et 6). L’enjeu est de comprendre le basculement qui s’opère dans l’économie de l’expression quand les algorithmes produisent des écrits de performativité supérieure à la langue courante et sont donc naturellement utilisés pour produire du texte par l’intermédiaire de prothèses. Nous assistons à ce que l’on pourrait qualifier de “second régime du capitalisme linguistique” allant au-delà de la vente aux enchères des mots pour proposer additionnellement la vente de services linguistiques se substituant à l’écriture même. Il devient alors possible, à l’image des services de traduction automatique, mais cette fois-ci pour sa propre langue, de s’exprimer avec plus d’efficacité et de produire de l’écrit adapté aux diverses exigences des situations de la vie professionnelle ou privée. Nous discutons comment la généralisation de ces prothèses linguistique pourrait conduire à une atrophie de la capacité expressive, pour l’écriture dans un premier temps et, peut-être, à terme pour l’oralité.
Prendre soin des l’informatique et des générations; Limoges: FYP Editions, 2021.ISBN : 978-2-36405-212-3
Named Entity Recognition and Classification in Historical Documents: A Survey
After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop appropriate technologies to efficiently search, retrieve and explore information from this ‘big data of the past’. Among semantic indexing opportunities, the recognition and classification of named entities are in great demand among humanities scholars. Yet, named entity recognition (NER) systems are heavily challenged with diverse, historical and noisy inputs. In this survey, we present the array of challenges posed by historical documents to NER, inventory existing resources, describe the main approaches deployed so far, and identify key priorities for future developments.
ACM Computing Survey
2021-09-21
Vol. 56 , num. 2, p. 27.Catch Me If You Can Designing a Disobedient Object to Protest Against GSM Surveillance
In this article, I discuss the process of designing an object to protest against a specific surveillance device: the IMSI catcher, a controversial object used to monitor GSM networks. Being widely used in protests, I develop a tactical approach based on obfuscation to be adopted collectively to counteract IMSI catchers. In this case study, (1) I present how can remaking an IMSI catcher allow to re-appropriate the technology and create a basis for designing a disobedient object; (2) I introduce some examples of tactics to defeat surveillance based on obfuscation and the potential of inflatables; (3) I conceptualize a possible design of an object to defeat IMSI catchers and show the types of interactions it might generate in protests.
Extended Abstracts Of The 2021 Chi Conference On Human Factors In Computing Systems (Chi’21)
2021-01-01
CHI Conference on Human Factors in Computing Systems, ELECTR NETWORK, May 08-13, 2021.DOI : 10.1145/3411763.3450363
Explorer la presse numérisée : le projet Impresso
« Impresso – Media Monitoring of the Past » est un projet de recherche interdisciplinaire dans lequel une équipe d’historiens, de linguistes informaticiens et de designers collabore à la mise en données d’un corpus d’archives de presse numérisées. Les principaux objectifs du projet sont d’améliorer les outils d’extraction d’information pour les textes historiques, d’indexer sémantiquement des journaux historiques, et d’intégrer les enrichissements obtenus dans les pratiques de recherche des historiens au moyen d’une interface nouvellement développée.
Revue Historique Vaudoise
2021-11-27
Vol. 129/2021 , p. 159-173.Method and system for generating a three-dimensional model based on spherical photogrammetry
A computer-implemented method is proposed for creating a three-dimensional model of the environment. The method comprises the steps of planning (105) a trajectory for a moving system carrying an omnidirectional camera comprising a first image sensor facing in a first direction for capturing first images, and a second image sensor facing in a second, different direction for capturing second images; advancing (107) the moving system along the trajectory; triggering (111) the omnidirectional camera at given time instants depending on the speed of the moving system along the trajectory to capture first and second images; obtaining (117) spherical images by selectively combining the first and second images; and creating (119) the three-dimensional model from the spherical images.
Patent number(s) :
- EP4165597 (A1)
- WO2021255495 (A1)
2021
Nouveau centre culturel et éducatif de La Chaux-de-Fonds
La ville de La Chaux-de-Fonds a été un important moteur économique pour l’industrie manufacturière suisse, avec l’arrivée de nombreuses entreprises horlogères au début du 20e siècle. La population de la ville a atteint son maximum dand les années 1970, en décroissant depuis, alors que la fabrication de produits de précision est devenue obsolète durant les dernières décénies. Par conséquent, bien que les prix des appartements aient baissé, beaucoup d’entre eux restent tout de même vacants, en raison d’un manque de proximité et d’accès au reste des villes francophones. Le projet que je propose vise à créer un point d’intérêt pour la région, à travers un complexe de programmes éducatifs et culturels à proximité de la gare. Le secteur industriel ayant quitté le centre-ville, les terrains appartenant aux Chemins de Fer Fédéraux suisses ont été privatisés à des promoteurs immobiliers. Le projet est situé sur le site des anciens dépôts et hangars de maintenance des opérateurs de fret ferroviaire CFF Cargo, au sud de la ville. Le programme utilise les bâtiments existants afin de conserver l’identité culturelle locale. De plus, deux nouveaux bâtiments accueillent les espaces pédagogiques qui utilisent l’infrastructure de la gare de triage. Le projet vise à créer un nouveau hub pour la ville, qui relie trois points centraux: la gare qui est en cours de rénovation, le parc historique des Crêtets et le Grand-Pont, qui relie les moitiés nord et sud de la ville.
2021
Advisor(s): M. Fröhlich; A. Fröhlich; F. Pardini
Navigation Improves the Survival Rate of Mobile-Bearing Total Knee Arthroplasty by Severe Preoperative Coronal Deformity: A Propensity Matched Case-Control Comparative Study
The primary hypothesis of this study was that the survival rate over 10 years of total knee arthroplasties (TKAs) implanted with a navigation system was superior to that of TKAs implanted with a conventional technique. The secondary hypothesis was that the severity of the initial coronal deformity had a negative influence on the survival rate. A national, multicentric, retrospective study was performed in France, including eight university or private centers with high volumes in knee surgery. Cases operated on with either a conventional (control group) or a navigated (study group) technique were matched after calculating the propensity score using the logistic regression technique. All patients were contacted after 10 years or more to determine the survival of the TKA. The need for date and cause of revision were noted. The primary end point of the study was the occurrence of a revision for any mechanical reason. Survival curves were calculated using the Kaplan-Meier’s technique, with the primary criterion as end point. The influence of the implantation technique was analyzed by a log-rank test at a 5% level of significance. The influence of severity of the preoperative coronal deformity was analyzed using the same technique. A total of 513 cases were included in each group. The survival rates after 13 years were 96.5% in the study group and 92.9% in the control group (not significant). There was no significant difference between both groups for the survival rates after 13 years for small deformity (96.0 vs. 97.0%), but the difference was significant for large deformity (97.0 vs. 89.0%, p =0.04). The results suggest that the use of a navigation system, allowing a more consistent correction of the preoperative coronal deformity, thus allows a better long-term prosthetic survival in cases with a large initial coronal deformity. A navigation system should be routinely used in cases of initial coronal deformity greater than or equal to 10degrees, as conventional techniques do not routinely provide satisfactory axial correction in these difficult cases.
Journal Of Knee Surgery
2021-08-01
Vol. 34 , num. 10, p. 1080-1084.DOI : 10.1055/s-0040-1701441
Detecting Text Reuse with Passim
In this lesson you will learn about text reuse detection – the automatic identification of reused passages in texts – and why you might want to use it in your research. Through a detailed installation guide and two case studies, this lesson will teach you the ropes of Passim, an open source and scalable tool for text reuse detection.
2021-05-16
Datasets and Models for Historical Newspaper Article Segmentation
Dataset and models used and produced in the work described in the paper “Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers”: https://infoscience.epfl.ch/record/282863?ln=en
2021
Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers
The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. Although the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of more fine-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. We introduce a multimodal neural model for the semantic segmentation of historical newspapers that directly combines visual features at pixel level with text embedding maps derived from, potentially noisy, OCR output. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to the wide variety of our material.
Journal of Data Mining & Digital Humanities
2021
Vol. 2021 , num. Special Issue on HistoInformatics: Computational Approaches to History, p. 1-26.DOI : 10.5281/zenodo.4065271
2020
I sistemi di immagini nell’archivio digitale di Vico Magistretti
La messa a disposizione in linea dell’archivio digitalizzato di Vico Magistretti che raggruppa decine di migliaia di disegni preparatori, disegni tecnici e fotografie prodotte tra 1946 e il 2006, apre la strada a un grande rinnovamento delle ricerche sul designer e architetto italiano. L’apertura di questo archivio così speciale ci invita a immaginare diverse prospettive che possono essere considerate per esplorare, visualizzare e studiare un tale insieme di documenti.
Narrare con l’Archivio. Forum internazionale, 19 novembre 2020, Milan, Italy, Novembre 19, 2020.
Swiss in motion : Analyser et visualiser les rythmes quotidiens. Une première approche à partir du dispositif Time-Machine.
Au cours des 50 dernières années, les développements technologiques dans le domaine des transports et des télécommunications ont contribué à reconfigurer les comportements spatio-temporels (Kaufmann, 2008). Les individus bénéficient ainsi d’un large univers de choix en matière de modes de transport et de lieux accessibles pour réaliser leurs activités. Cette configuration influence en particulier les comportements de mobilité quotidienne qui tendent à se complexifier tant dans leur dimension spatiale que temporelle impliquant ainsi l’émergence de rythmes quotidiens intenses et complexes (Drevon, Gwiazdzinski, & Klein, 2017; Gutiérrez & García-Palomares, 2007). Des recherches récentes menées sur la Suisse (Drevon, Gumy, & Kaufmann, 2020) suggèrent que les rythmes quotidiens sont marqués par une importante diversité en matière de configuration spatio-temporelle et de densité d’activités (Drevon, Gumy, Kaufmann, & Hausser, 2019). La part des rythmes quotidiens qui correspond à la figure du métro-boulot-dodo est finalement relativement modeste. Cette diversité de rythmes quotidiens se déploie entre d’un côté des comportements très complexes et d’autres peu complexes qui se matérialisent à différentes échelles spatiales. Force est de constater que les outils d’analyse actuels en sciences sociales et en socio-économie des transports peinent encore à rendre compte des formes complexes de rythmes quotidiens au niveau individuel et territorial. Face à cet enjeu épistémologique et méthodologique, la communication propose une approche innovante et interdisciplinaire qui associe la Sociologie, la Géographie et les Sciences computationnelle. Il s’agit concrètement de proposer un outil de géo-visualisation des rythmes quotidiens au échelles individuelles et territoriales à partir des comportements spatio-temporels des habitants de la Suisse. L’objectif de cette démarche est de mettre en perspective les différentiels d’intensité en matière d’activité entre les situations sociales et les territoires. Les analyses s’appuient sur l’enquête Microrecensement Mobilité et Transports (MRMT) réalisée tous les 5 ans à l’échelle nationale par l’Office fédéral de la statistique et l’Office fédéral du développement territorial réalisé en 2015. Cette enquête est composée d’un échantillon 57 090 personnes qui ont été interrogées sur l’ensemble de leurs déplacements effectués la veille du jour d’enquête (protocole d’enquête CATI). La visualisation est réalisée à partir du dispositif Time-Machine (Kaplan, 2013; di Lenardo & Kaplan, 2015) qui permet de modéliser un environnement virtuel en 4D (Figure 1 : https://youtu.be/41-klvXLCqM) et de simuler le déploiement des activités et des déplacements quotidiens. Les premières simulations révèlent des régimes rythmiques contrastés à l’échelle individuelle qui se différencient selon les allures, la fréquence d’actions, l’échelle spatiale et la position sociale. Au niveau territoriales les visualisations laissent apparaitre des différentiels importants dans l’intensité d’usage du territoire par les individus et des spécificités spatiales constitutives des activités qui y sont réalisées. Ces premières visualisations permettent d’abord de révéler des inégalités sociales (genre, classe) face aux injonctions à l’activité (Viry, Ravalet, & Kaufmann, 2015; Drevon, 2019; Drevon & Kaufmann, 2020). Elle permettent aussi de rediscuter des modalités de catégorisation des territoires (Rérat, 2008; Schuler et al., 2007) à partir d’une approche dynamique qui témoigne de la réalité des activités temporaires remettant par ailleurs en perspective les principes de l’écologie urbaine factorielle (Pruvot & Weber-Klein, 1984) et en renforçant également l’intérêt de l’économie présentielle (Lejoux, 2009).
Swiss Mobility Conference, Lausanne, October 29-30, 2020.
A digital reconstruction of the 1630–1631 large plague outbreak in Venice
The plague, an infectious disease caused by the bacterium Yersinia pestis, is widely considered to be responsible for the most devastating and deadly pandemics in human history. Starting with the infamous Black Death, plague outbreaks are estimated to have killed around 100 million people over multiple centuries, with local mortality rates as high as 60%. However, detailed pictures of the disease dynamics of these outbreaks centuries ago remain scarce, mainly due to the lack of high-quality historical data in digital form. Here, we present an analysis of the 1630–1631 plague outbreak in the city of Venice, using newly collected daily death records. We identify the presence of a two-peak pattern, for which we present two possible explanations based on computational models of disease dynamics. Systematically digitized historical records like the ones presented here promise to enrich our understanding of historical phenomena of enduring importance. This work contributes to the recently renewed interdisciplinary foray into the epidemiological and societal impact of pre-modern epidemics.
Scientific Reports
2020-10-20
Vol. 10 , num. 1, p. 17849.DOI : 10.1038/s41598-020-74775-6
The Advent of the 4D Mirror World
The 4D Mirror World is considered to be the next planetary-scale information platform. This commentary gives an overview of the history of the converging trends that have progressively shaped this concept. It retraces how large-scale photographic surveys served to build the first 3D models of buildings, cities, and territories, how these models got shaped into physical and virtual globes, and how eventually the temporal dimension was introduced as an additional way for navigating not only through space but also through time. The underlying assumption of the early large-scale photographic campaign was that image archives had deeper depths of latent knowledge still to be mined. The technology that currently permits the advent of the 4D World through new articulations of dense photographic material combining aerial imagery, historic photo archives, huge video libraries, and crowd-sourced photo documentation precisely exploits this latent potential. Through the automatic recognition of “homologous points,” the photographic material gets connected in time and space, enabling the geometrical computation of hypothetical reconstructions accounting for a perpetually evolving reality. The 4D world emerges as a series of sparse spatiotemporal zones that are progressively connected, forming a denser fabric of representations. On this 4D skeleton, information of cadastral maps, BIM data, or any other specific layers of a geographical information system can be easily articulated. Most of our future planning activities will use it as a way not only to have smooth access to the past but also to plan collectively shared scenarios for the future.
Urban Planning
2020-06-30
Vol. 5 , num. 2, p. 307.DOI : 10.17645/up.v5i2.3133
Rhythmanalysis of Urban Events: Empirical Elements from the Montreux Jazz Festival
This article proposes an original approach to urban events mapping. At the theoretical level, the article is based on rhythmanalysis and recent research on urban rhythms. It contrasts with previous research by departing from everyday rhythms to tackle the specific rhythms of urban events. Drawing on this theoretical framework, the article proposes to analyse the rhythms of the Montreux Jazz Festival. The article proposes two main types of rhythmic scales, linked with the historical development of the Festival and its annual performance. The methodology is based on a mixed method of data collection and an original analysis framework. The analysis of the historical rhythm is carried out based on the analysis of the festival archives and interviews with experts. The analysis uses the Time Machine visualisation device that reveals three processes of urban resonance: the spread, which shows how the festival is integrated into the existing urban fabric; the openness, which shows accessibility; and the grip, which seeks to evaluate the urban sphere of influence of the event. These different visualisations are enriched by the addition of other data, including ticket scanning and commercial transactions that show the alternance between high and low-intensity periods. These allowed us to not only confirm the impact of programming on flows, but also the effects of the wider organisation of the leisure system. The results of the analysis show that the intertwining of the two rhythmic scales produces a hyper-place that resonates both internationally and locally.
Urban Planning
2020-06-30
Vol. 5 , num. 2, p. 280-295.DOI : 10.17645/up.v5i2.2940
CLEF-HIPE-2020 Shared Task Named Entity Datasets
CLEF-HIPE-2020 (Identifying Historical People, Places and other Entities) is a evaluation campaign on named entity processing on historical newspapers in French, German and English, which was organized in the context of the impresso project and run as a CLEF 2020 Evaluation Lab. Data consists of manually annotated historical newspapers in French, German and English.
2020
Neural networks for semantic segmentation of historical city maps: Cross-cultural performance and the impact of figurative diversity
In this work, we present a new semantic segmentation model for historical city maps that surpasses the state of the art in terms of flexibility and performance. Research in automatic map processing is largely focused on homogeneous corpora or even individual maps, leading to inflexible algorithms. Recently, convolutional neural networks have opened new perspectives for the development of more generic tools. Based on two new maps corpora, the first one centered on Paris and the second one gathering cities from all over the world, we propose a method for operationalizing the figuration based on traditional computer vision algorithms that allows large-scale quantitative analysis. In a second step, we propose a semantic segmentation model based on neural networks and implement several improvements. Finally, we analyze the impact of map figuration on segmentation performance and evaluate future ways to improve the representational flexibility of neural networks. To conclude, we show that these networks are able to semantically segment map data of a very large figurative diversity with efficiency.
2020
Historical Newspaper Content Mining: Revisiting the impresso Project’s Challenges in Text and Image Processing, Design and Historical Scholarship
Long abstract for a presentation at DH2020 (online).
DH2020 Book of Abstracts
2020
Digital Humanities Conference (DH), Ottawa, Canada, July 20-24, 2020.DOI : 10.5281/zenodo.4641894
The impresso system architecture in a nutshell
This post describes the impresso application architecture and processing in a nutshell. The text was published in October 2020 in issue number 16 of the EuropeanaTech Insights dedicated to digitized newspapers and edited by Gregory Markus and Clemens Neudecker: https://pro.europeana.eu/page/issue-16-newspapers#the-impresso-system-architecture-in-a-nutshell
2020
p. 10.Impresso Named Entity Annotation Guidelines (CLEF-HIPE-2020)
Impresso annotation guidelines used in the context of corpus annotation for the HIPE shared task (CLEF 2020 Evaluation Lab). CLEF-HIPE-2020 shared task: https://impresso.github.io/CLEF-HIPE-2020/ Impresso project: https://impresso-project.ch
2020
p. 29.CLEF-HIPE-2020 – Shared Task Participation Guidelines
This document summarizes instructions for participants to the CLEF-HIPE-2020 shared task. HIPE (Identifying Historical People, Places and other Entities) is a named entity processing evaluation campaign on historical newspapers in French, German and English, organized in the context of the impresso project and run as a CLEF 2020 Evaluation Lab. More information on the website: https://impresso.github.io/CLEF-HIPE-2020/
2020
p. 19.Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers
This paper presents an extended overview of the first edition of HIPE (Identifying Historical People, Places and other Entities), a pioneering shared task dedicated to the evaluation of named entity processing on historical newspapers in French, German and English. Since its introduction some twenty years ago, named entity (NE) processing has become an essential component of virtually any text mining application and has undergone major changes. Recently, two main trends characterise its developments: the adoption of deep learning architectures and the consideration of textual material originating from historical and cultural heritage collections. While the former opens up new opportunities, the latter introduces new challenges with heterogeneous, historical and noisy inputs. In this context, the objective of HIPE, run as part of the CLEF 2020 conference, is threefold: strengthening the robustness of existing approaches on non-standard inputs, enabling performance comparison of NE processing on historical texts, and, in the long run, fostering efficient semantic indexing of historical documents. Tasks, corpora, and results of 13 participating teams are presented. Compared to the condensed overview [31], this paper includes further details about data generation and statistics, additional information on participating systems, and the presentation of complementary results.
CLEF 2020 Working Notes. Conference and Labs of the Evaluation Forum
2020-10-21
11th Conference and Labs of the Evaluation Forum (CLEF 2020), [Online event], 22-25 September, 2020.DOI : 10.5281/zenodo.4117566
Language Resources for Historical Newspapers: the Impresso Collection
Following decades of massive digitization, an unprecedented amount of historical document facsimiles can now be retrieved and accessed via cultural heritage online portals. If this represents a huge step forward in terms of preservation and accessibility, the next fundamental challenge– and real promise of digitization– is to exploit the contents of these digital assets, and therefore to adapt and develop appropriate language technologies to search and retrieve information from this `Big Data of the Past’. Yet, the application of text processing tools on historical documents in general, and historical newspapers in particular, poses new challenges, and crucially requires appropriate language resources. In this context, this paper presents a collection of historical newspaper data sets composed of text and image resources, curated and published within the context of the `impresso – Media Monitoring of the Past’ project. With corpora, benchmarks, semantic annotations and language models in French, German and Luxembourgish covering ca. 200 years, the objective of the impresso resource collection is to contribute to historical language resources, and thereby strengthen the robustness of approaches to non-standard inputs and foster efficient processing of historical documents.
Proceedings of the 12th Language Resources and Evaluation Conference
2020-05-11
12th International Conference on Language Resources and Evaluation (LREC), Marseille, France, May 11-16 2020.p. 958-968
DOI : 10.5281/zenodo.4641902
Overview of CLEF HIPE 2020: Named Entity Recognition and Linking on Historical Newspapers
This paper presents an overview of the first edition of HIPE (Identifying Historical People, Places and other Entities), a pioneering shared task dedicated to the evaluation of named entity processing on historical newspapers in French, German and English. Since its introduction some twenty years ago, named entity (NE) processing has become an essential component of virtually any text mining application and has undergone major changes. Recently, two main trends characterise its developments: the adoption of deep learning architectures and the consideration of textual material originating from historical and cultural heritage collections. While the former opens up new opportunities, the latter introduces new challenges with heterogeneous, historical and noisy inputs. In this context, the objective of HIPE, run as part of the CLEF 2020 conference, is threefold: strengthening the robustness of existing approaches on non-standard inputs, enabling performance comparison of NE processing on historical texts, and, in the long run, fostering efficient semantic indexing of historical documents. Tasks, corpora, and results of 13 participating teams are presented.
Experimental IR meets multilinguality, multimodality, and interaction. 11th International Conference of the CLEF Association, CLEF 2020, Thessaloniki, Greece, September 22–25, 2020, Proceedings
2020-09-15
11th International Conference of the CLEF Association – CLEF 2020, Thessaloniki, Greece, September 22–25, 2020.p. 288–310
DOI : 10.1007/978-3-030-58219-7_21
Building a Mirror World for Venice
Between 2012 and 2019, ‘TheVeniceTime Machine Project’ developed a new methodology for modelling the past, present, and future of a city. This methodology is based on two pillars: (a) the vast digitisation and processing of the selected city’s historical records, (b) the digitisation of the city itself, another vast undertaking. The combination of these two processes has the potential to create a new kind of historical information system organised around a diachronic digital twin of a city.
The Aura in the Age of Digital Materiality : Rethinking Preservation in the Shadow of an Uncertain Future; Milan: SilvanaEditoriale, 2020.ISBN : 9788836645480
Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers
Since its introduction some twenty years ago, named entity (NE) processing has become an essential component of virtually any text mining application and has undergone major changes. Recently, two main trends characterise its developments: the adoption of deep learning architectures and the consideration of textual material originating from historical and cultural heritage collections. While the former opens up new opportunities, the latter introduces new challenges with heterogeneous, historical and noisy inputs. If NE processing tools are increasingly being used in the context of historical documents, performance values are below the ones on contemporary data and are hardly comparable. In this context, this paper introduces the CLEF 2020 Evaluation Lab HIPE (Identifying Historical People, Places and other Entities) on named entity recognition and linking on diachronic historical newspaper material in French, German and English. Our objective is threefold: strengthening the robustness of existing approaches on non-standard inputs, enabling performance comparison of NE processing on historical texts, and, in the long run, fostering efficient semantic indexing of historical documents in order to support scholarship on digital cultural heritage collections.
Advances in Information Retrieval. ECIR 2020
2020-04-08
ECIR 2020 : 42nd European Conference on Information Retrieval, Lisbon, Portugal, April 14-17, 2020.p. 524-532
DOI : 10.1007/978-3-030-45442-5_68
The Hermeneutic Circle of Data Visualization: the Case Study of the Affinity Map
In this article, we show how postphenomenology can be used to analyze a visual method that reveals the hidden dynamics that exist between individuals within large organizations. We make use of the Affinity Map to expand the classic postphenomenology that privileges a ‘linear’ understanding of technological mediations introducing the notions of ‘iterativity’ and ‘collectivity.’ In the first section, both classic and more recent descriptions of human-technology-world relations are discussed to transcendentally approach the discipline of data visualization. In the second section, the Affinity Map case study is used to stress three elements: 1) the collection of data and the design process; 2) the visual grammar of the data visualization, and 3) the process of self-recognition for the map ‘reader.’ In the third section, we introduce the hermeneutic circle of data visualization. Finally, in the concluding section, we put forth how the Affinity Map might be seen as the material encounter between postphenomenology, actor-network theory (ANT), and hermeneutics, through ethical and political multistability.
Techné: Research in Philosophy and Technology
2020
Vol. 24 , num. 3, p. 357-375.DOI : 10.5840/techne202081126
2019
Transforming scholarship in the archives through handwritten text recognition Transkribus as a case study
Purpose An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues. Design/methodology/approach This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material. Findings Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified. Research limitations/implications – The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc. Practical implications – Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field. Social implications The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals. Originality/value This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector.
Journal Of Documentation
2019-09-09
Vol. 75 , num. 5, p. 954-976.DOI : 10.1108/JD-07-2018-0114
A deep learning approach to Cadastral Computing
This article presents a fully automatic pipeline to transform the Napoleonic Cadastres into an information system. The cadastres established during the first years of the 19th century cover a large part of Europe. For many cities they give one of the first geometrical surveys, linking precise parcels with identification numbers. These identification numbers points to registers where the names of the proprietary. As the Napoleonic cadastres include millions of parcels , it therefore offers a detailed snapshot of large part of Europe’s population at the beginning of the 19th century. As many kinds of computation can be done on such a large object, we use the neologism “cadastral computing” to refer to the operations performed on such datasets. This approach is the first fully automatic pipeline to transform the Napoleonic Cadastres into an information system.
2019-07-11
Digital Humanities Conference, Utrecht, Netherlands, July 8-12, 2019.Repopulating Paris: massive extraction of 4 Million addresses from city directories between 1839 and 1922
In 1839, in Paris, the Maison Didot bought the Bottin company. Sébastien Bottin trained as a statistician was the initiator of a high impact yearly publication, called “Almanachs” containing the listing of residents, businesses and institutions, arranged geographically, alphabetically and by activity typologies (Fig. 1). These regular publications encountered a great success. In 1820, the Parisian Bottin Almanach contained more than 50 000 addresses and until the end of the 20th century the word “Bottin” was the colloquial term to designate a city directory in France. The publication of the “Didot-Bottin” continued at an annual rhythm, mapping the evolution of the active population of Paris and other cities in France.The relevance of automatically mining city directories for historical reconstruction has already been argued by several authors (e.g Osborne, N., Hamilton, G. and Macdonald, S. 2014 or Berenbaum, D. et al. (2016). This article reports on the extraction and analysis of the data contained in “Didot-Bottin” covering the period 1839-1922 for Paris, digitized by the Bibliotheque nationale de France. We process more than 27 500 pages to create a database of 4,2 Million entries linking addresses, person mention and activities.
Abstracts and Posters from the Digital Humanities 2019 conference
2019-07-02
Digital Humanities Conference 2019 (DH2019), Utrecht , the Netherlands, July 9-12, 2019.DOI : 10.34894/MNF5VQ
Search for a standard model-like Higgs boson in the mass range between 70 and 110 GeV in the diphoton final state in proton-proton collisions at $\sqrt{s}=$ 8 and 13 TeV
The results of a search for a standard model-like Higgs boson in the mass range between 70 and 110 GeV decaying into two photons are presented. The analysis uses the data set collected with the CMS experiment in proton-proton collisions during the 2012 and 2016 LHC running periods. The data sample corresponds to an integrated luminosity of 19.7 (35.9)fb−1 at s=8 (13) TeV. The expected and observed 95% confidence level upper limits on the product of the cross section and branching fraction into two photons are presented. The observed upper limit for the 2012 (2016) data set ranges from 129 (161) fb to 31 (26) fb. The statistical combination of the results from the analyses of the two data sets in the common mass range between 80 and 110 GeV yields an upper limit on the product of the cross section and branching fraction, normalized to that for a standard model-like Higgs boson, ranging from 0.7 to 0.2, with two notable exceptions: one in the region around the Z boson peak, where the limit rises to 1.1, which may be due to the presence of Drell–Yan dielectron production where electrons could be misidentified as isolated photons, and a second due to an observed excess with respect to the standard model prediction, which is maximal for a mass hypothesis of 95.3 GeV with a local (global) significance of 2.8 (1.3) standard deviations.
Physics Letters B
2019-06-10
Vol. 793 , p. 320-347.DOI : 10.1016/j.physletb.2019.03.064
Combinations of single-top-quark production cross-section measurements and |f$_{LV}$V$_{tb}$| determinations at $ \sqrt{s} $ = 7 and 8 TeV with the ATLAS and CMS experiments
This paper presents the combinations of single-top-quark production cross-section measurements by the ATLAS and CMS Collaborations, using data from LHC proton-proton collisions at $ \sqrt{s} $ = 7 and 8 TeV corresponding to integrated luminosities of 1.17 to 5.1 fb$^{−1}$ at $ \sqrt{s} $ = 7 TeV and 12.2 to 20.3 fb$^{−1}$ at $ \sqrt{s} $ = 8 TeV. These combinations are performed per centre-of-mass energy and for each production mode: t-channel, tW, and s-channel. The combined t-channel cross-sections are 67.5 ± 5.7 pb and 87.7 ± 5.8 pb at $ \sqrt{s} $ = 7 and 8 TeV respectively. The combined tW cross-sections are 16.3 ± 4.1 pb and 23.1 ± 3.6 pb at $ \sqrt{s} $ = 7 and 8 TeV respectively. For the s-channel cross-section, the combination yields 4.9 ± 1.4 pb at $ \sqrt{s} $ = 8 TeV. The square of the magnitude of the CKM matrix element V$_{tb}$ multiplied by a form factor f$_{LV}$ is determined for each production mode and centre-of-mass energy, using the ratio of the measured cross-section to its theoretical prediction. It is assumed that the top-quark-related CKM matrix elements obey the relation |V$_{td}$|, |V$_{ts}$| ≪ |V$_{tb}$|. All the |f$_{LV}$V$_{tb}$|$^{2}$ determinations, extracted from individual ratios at $ \sqrt{s} $ = 7 and 8 TeV, are combined, resulting in |f$_{LV}$V$_{tb}$| = 1.02 ± 0.04 (meas.) ± 0.02 (theo.). All combined measurements are consistent with their corresponding Standard Model predictions.
Journal of High Energy Physics
2019-05-16
p. 88.DOI : 10.1007/JHEP05(2019)088
Measurement of inclusive very forward jet cross sections in proton-lead collisions at $ \sqrt{s_{\mathrm{NN}}} $ = 5.02 TeV
Measurements of differential cross sections for inclusive very forward jet production in proton-lead collisions as a function of jet energy are presented. The data were collected with the CMS experiment at the LHC in the laboratory pseudorapidity range −6.6 < η < −5.2. Asymmetric beam energies of 4 TeV for protons and 1.58 TeV per nucleon for Pb nuclei were used, corresponding to a center-of-mass energy per nucleon pair of $ \sqrt{s_{\mathrm{NN}}} $ = 5.02 TeV. Collisions with either the proton (p+Pb) or the ion (Pb+p) traveling towards the negative η hemisphere are studied. The jet cross sections are unfolded to stable-particle level cross sections with p$_{T}$ ≳ 3 GeV, and compared to predictions from various Monte Carlo event generators. In addition, the cross section ratio of p+Pb and Pb+p data is presented. The results are discussed in terms of the saturation of gluon densities at low fractional parton momenta. None of the models under consideration describes all the data over the full jet-energy range and for all beam configurations. Discrepancies between the differential cross sections in data and model predictions of more than two orders of magnitude are observed.
Journal of High Energy Physics
2019-05-07
p. 43.DOI : 10.1007/JHEP05(2019)043
Measurement of exclusive $\Upsilon$ photoproduction from protons in pPb collisions at $\sqrt{s_\mathrm{NN}} =$ 5.02 TeV
The exclusive photoproduction of $\mathrm {\Upsilon }\mathrm {(nS)} $ meson states from protons, $\gamma \mathrm {p} \rightarrow \mathrm {\Upsilon }\mathrm {(nS)} \,\mathrm {p}$ (with $\mathrm {n}=1,2,3$ ), is studied in ultraperipheral $\mathrm {p}$ Pb collisions at a centre-of-mass energy per nucleon pair of $\sqrt{\smash [b]{s_{_{\mathrm {NN}}}}} = 5.02\,\text {TeV} $ . The measurement is performed using the $\mathrm {\Upsilon }\mathrm {(nS)} \rightarrow \mu ^+\mu ^-$ decay mode, with data collected by the CMS experiment corresponding to an integrated luminosity of 32.6 $\,\text {nb}^{-1}$ . Differential cross sections as functions of the $\mathrm {\Upsilon }\mathrm {(nS)} $ transverse momentum squared $p_{\mathrm {T}} ^2$ , and rapidity y, are presented. The $\mathrm {\Upsilon (1S)}$ photoproduction cross section is extracted in the rapidity range $|y |< 2.2$ , which corresponds to photon–proton centre-of-mass energies in the range $91 The European Physical Journal C 2019-03-26 DOI : 10.1140/epjc/s10052-019-6774-8
Measurement of prompt $\psi$(2S) production cross sections in proton-lead and proton-proton collisions at $\sqrt{s_{_\mathrm{NN}}}=$ 5.02 TeV
Measurements of prompt ψ(2S) meson production cross sections in proton–lead (pPb) and proton–proton (pp) collisions at a nucleon–nucleon center-of-mass energy of sNN=5.02TeV are reported. The results are based on pPb and pp data collected by the CMS experiment at the LHC, corresponding to integrated luminosities of 34.6 nb−1 and 28.0 pb−1 , respectively. The nuclear modification factor RpPb is measured for prompt ψ(2S) in the transverse momentum range 4 Physics Letters B 2019-03-10 DOI : 10.1016/j.physletb.2019.01.058
Measurement of nuclear modification factors of $\Upsilon$(1S), $\Upsilon$(2S), and $\Upsilon$(3S) mesons in PbPb collisions at $\sqrt{s_{_\mathrm{NN}}} =$ 5.02 TeV
The cross sections for ϒ(1S), ϒ(2S), and ϒ(3S) production in lead–lead (PbPb) and proton–proton (pp) collisions at sNN=5.02 TeV have been measured using the CMS detector at the LHC. The nuclear modification factors, RAA , derived from the PbPb-to- pp ratio of yields for each state, are studied as functions of meson rapidity and transverse momentum, as well as PbPb collision centrality. The yields of all three states are found to be significantly suppressed, and compatible with a sequential ordering of the suppression, RAA(ϒ(1S))>RAA(ϒ(2S))>RAA(ϒ(3S)) . The suppression of ϒ(1S) is larger than that seen at sNN=2.76TeV , although the two are compatible within uncertainties. The upper limit on the RAA of ϒ(3S) integrated over pT , rapidity and centrality is 0.096 at 95% confidence level, which is the strongest suppression observed for a quarkonium state in heavy ion collisions to date.
Physics Letters B
2019-03-10
Vol. 790 , p. 270-293.DOI : 10.1016/j.physletb.2019.01.006
Search for Higgs boson pair production in the $\gamma\gamma\mathrm{b\overline{b}}$ final state in pp collisions at $\sqrt{s}=$ 13 TeV
A search is presented for the production of a pair of Higgs bosons, where one decays into two photons and the other one into a bottom quark–antiquark pair. The analysis is performed using proton–proton collision data at s=13TeV recorded in 2016 by the CMS detector at the LHC, corresponding to an integrated luminosity of 35.9fb−1 . The results are in agreement with standard model (SM) predictions. In a search for resonant production, upper limits are set on the cross section for new spin-0 or spin-2 particles. For the SM-like nonresonant production hypothesis, the data exclude a product of cross section and branching fraction larger than 2.0fb at 95% confidence level (CL), corresponding to about 24 times the SM prediction. Values of the effective Higgs boson self-coupling κλ are constrained to be within the range −11<κλ<17 at 95% CL, assuming all other Higgs boson couplings are at their SM value. The constraints on κλ are the most restrictive to date.
Physics Letters B
2019-01-10
Vol. 788 , p. 7-36.DOI : 10.1016/j.physletb.2018.10.056
Search for heavy resonances decaying into two Higgs bosons or into a Higgs boson and a W or Z boson in proton-proton collisions at 13 TeV
A search is presented for massive narrow resonances decaying either into two Higgs bosons, or into a Higgs boson and a W or Z boson. The decay channels considered are $ \mathrm{H}\mathrm{H}\to \mathrm{b}\overline{\mathrm{b}}{\tau}^{+}{\tau}^{-} $ and $ \mathrm{V}\mathrm{H}\to \mathrm{q}\overline{\mathrm{q}}{\tau}^{+}{\tau}^{-} $ , where H denotes the Higgs boson, and V denotes the W or Z boson. This analysis is based on a data sample of proton-proton collisions collected at a center-of-mass energy of 13 TeV by the CMS Collaboration, corresponding to an integrated luminosity of 35.9 fb$^{−1}$. For the TeV-scale mass resonances considered, substructure techniques provide ways to differentiate among the hadronization products from vector boson decays to quarks, Higgs boson decays to bottom quarks, and quark- or gluon-induced jets. Reconstruction techniques are used that have been specifically optimized to select events in which the tau lepton pair is highly boosted. The observed data are consistent with standard model expectations and upper limits are set at 95% confidence level on the product of cross section and branching fraction for resonance masses between 0.9 and 4.0 TeV. Exclusion limits are set in the context of bulk radion and graviton models:spin-0 radion resonances are excluded below a mass of 2.7 TeV at 95% confidence level. In the spin-1 heavy vector triplet framework, mass-degenerate W′ and Z′ resonances with dominant couplings to the standard model gauge bosons are excluded below a mass of 2.8 TeV at 95% confidence level. These are the first limits for massive resonances at the TeV scale with these decay channels at $ \sqrt{s}=13 $ TeV.
Journal of High Energy Physics
2019-01-07
p. 51.DOI : 10.1007/JHEP01(2019)051
Can Venice be saved?
Will Venice be inhabitable in 2100? What kinds of policies can we develop to navigate the best scenarios for this floating city? In 2012, the École Polytechnique Fédérale de Lausanne (EPFL) and the University Ca’Foscari launched a programme called the Venice Time Machine to create a large-scale digitisation project transforming Venice’s heritage into ‘big data’. Thanks to the support of the Lombard Odier Foundation, millions of pages and photographs have been scanned at the state archive in Venice and at the Fondazione Giorgio Cini. While commercial robotic scanners were used at the archives, a new typology of robotised circular table was developed by Adam Lowe and his team at Factum Arte to process the million photographs of Fondazione Giorgio Cini. The documents were analysed using deep-learning artificial-intelligence methods to extract their textual and iconographic content and to make the data accessible via a search engine. Also during this time, thousands of primary and secondary sources were compiled to create the first 4D model (3D + time) of the city, showing the evolution of its urban fabric. This model and the other data compiled by the Venice Time Machine were part of an exhibition at the Venice Pavilion of the Biennale of Architecture in 2018, shown side-by-side with potential projects for Venice’s future. Having reached an important milestone in convincing not only the Venetian stakeholders but also a growing number of partners around the world that care about Venice’s future, the Venice Time Machine is now raising funds for the most ambitious simulation of the city that has ever been developed. Its planned activities include a high-resolution digitisation campaign of the entire city at centimetre scale, a crucial step on which to base a future simulation of the city’s evolution, while also creating a digital model that can be used for preservation regardless of what occurs in the coming decades. On the island of San Giorgio Maggiore, a digitisation centre called ARCHiVe (Analysis and Recording of Cultural Heritage in Venice) opened in 2018 to process a large variety of Venetian artefacts. This is a joint effort of Factum Foundation, the École Polytechnique Fédérale de Lausanne and the Fondazione Giorgio Cini, along with philanthropic support from the Helen Hamlyn Trust. The centre aims to become a training centre for future cultural heritage professionals who would like to learn how they can use artificial intelligence and robotics to preserve documents, objects and sites. These operations will work together to create a multiscale digital model of Venice, combining the most precise 4D information on the evolution of the city and its population with all the available documentation of its past. The project aims to demonstrate how this ‘digital double’ can be achieved by using robotic technology to scan the city and its archives on a massive scale, using artificial intelligence techniques to process documents and collecting the efforts of thousands of enthusiastic Venetians. In a project called ‘Venice 2100’, the Venice Time Machine team’s ambition is to show how a collectively built information system can be used to build realistic future scenarios, blending ecological and social data into large-scale simulations. The Venice Time Machine’s ‘hypermodel’ will also create economic opportunities. If its hypotheses are valid, Venice could host the first incubators for start-ups using big data of the past to develop services for smart cities, creative industries, education, academic scholarship and policy making. This could be the beginning of a renewal of Venice’s economic life, encouraging younger generations to pursue activities in the historic city, at the heart of what may become one of the first AI-monitored cities of the world. Venice can reinvent itself as the city that put the most advanced information technology and cultural heritage at the core of its survival and its strategy for development. Artificial intelligence can not only save Venice, but Venice can be the place to invent a new form of artificial intelligence.
Apollo, The International Art Magazine
2019-01-02
Citation Mining of Humanities Journals: The Progress to Date and the Challenges Ahead
Even large citation indexes such as the Web of Science, Scopus or Google Scholar cover only a small fraction of the literature in the humanities. This coverage sensibly decreases going backwards in time. Citation mining of humanities publications — defined as an instance of bibliometric data mining and as a means to the end of building comprehensive citation indexes — remains an open problem. In this contribution we discuss the results of two recent projects in this area: Cited Loci and Linked Books. The former focused on the domain of classics, using journal articles in JSTOR as a corpus; the latter considered the historiography on Venice and a novel corpus of journals and monographs. Both projects attempted to mine citations of all kinds — abbreviated and not, to all types of sources, including primary sources — and considered a wide time span (19th to 21st century). We first discuss the current state of research in citation mining of humanities publications. We then present the various steps involved into this process, from corpus selection to data publication, discussing the peculiarities of the humanities. The approaches taken by the two projects are compared, allowing us to highlight disciplinary differences and commonalities, as well as shared challenges between historiography and classics on this respect. The resulting picture portrays humanities citation mining as a field with a great, yet mostly untapped potential, and a few still open challenges. The potential lies in using citations as a means to interconnect digitized collections at a large scale, by making explicit the linking function of bibliographic citations. As for the open challenges, a key issue is the existing need for an integrated metadata infrastructure and an appropriate legal framework to facilitate citation mining in the humanities.
Journal of European Periodical Studies
2019-06-30
Vol. 4 , num. 1, p. 36-53.DOI : 10.21825/jeps.v4i1.10120
Survey of digitized newspaper interfaces (dataset and notebooks)
This record contains the datasets and jupyter notebooks which support the analysis presented in the paper “Historical Newspaper User Interfaces: A Review”. Please refer to the paper or the github repository for more information (see links below), or do not hesitate to contact us!
2019
Experiments in digital publishing: creating a digital compendium
This chapter introduces the readers and users to the goals of the digitally provided index of the compendium Structures of Epic Poetry and the methods used for it. It also expands on the broader applicability of digital methods in view of electronic publishing, and to the problems involved. The chapter focuses on two aspects of my work for the compendium, where digital tools played a central role: the creation of the index locorum and the development of a digital compendium to the printed volumes.
Structures of Epic Poetry; Berlin, Boston: De Gruyter, 2019-12-15.ISBN : 9783110492590
DOI : 10.1515/9783110492590-074
Linked Books: un indice citazionale per la storia di Venezia
We present the outcomes of the Linked Books project, resulting in a prototype citation index interlinking the Italian national library catalog (Opac SBN) with the information system of the State Archive of Venice and international authority records or “metaengines” such as VIAF.org and Europeana. Our prototype includes 3.850.581 citations extracted from a corpus of 2.475 volumes, of which 1.905 monographs, and 552 journal volumes, or 5.496 articles therein. The corpus is focused on the history of Venice. The Linked Books project allowed us to explore the feasibility and desirability of a citation index for the humanities, and to face and solve technical challenges including: the selection of a thematically representative corpus from bibliographic resources and expertise, the digitization of these materials within the bounds of copyright, the automatic extraction of citations and the development of public search interfaces.
DigItalia
2019-06-01
Vol. 14 , num. 1, p. 132-146.The Past, Present and Future of Digital Scholarship with Newspaper Collections
Historical newspapers are of interest to many humanities scholars as sources of information and language closely tied to a particular time, social context and place. Digitised newspapers are also of interest to many data-driven researchers who seek large bodies of text on which they can try new methods and tools. Recently, large consortia projects applying data science and computational methods to historical newspapers at scale have emerged, including NewsEye, \textit{impresso}, Oceanic Exchanges and Living with Machines. This multi-paper panel draws on the work of a range of interdisciplinary newspaper-based digital humanities and/or data science projects, alongside ‘provocations’ from two senior scholars who will provide context for current ambitions. As a unique opportunity for stakeholders to engage in dialogue, for the DH2019 community to ask their own questions of newspaper-based projects, and for researchers to map methodological similarities between projects, it aims to have a significant impact on the field.
DH 2019 Book of Abstracts
2019-07-09
DIgital Humanities Conference, Utrecht, July 2019.Historical newspaper semantic segmentation using visual and textual features
Mass digitization and the opening of digital libraries gave access to a huge amount of historical newspapers. In order to bring structure into these documents, current techniques generally proceed in two distinct steps. First, they segment the digitized images into generic articles and then classify the text of the articles into finer-grained categories. Unfortunately, by losing the link between layout and text, these two steps are not able to account for the fact that newspaper content items have distinctive visual features. This project proposes two main novelties. Firstly, it introduces the idea of merging the segmentation and classification steps, resulting in a fine- grained semantic segmentation of newspapers images. Secondly, it proposes to use textual features under the form of embeddings maps at segmentation step. The semantic segmentation with four categories (feuilleton, weather forecast, obituary, and stock exchange table) is done using a fully convolutional neural network and reaches a mIoU of 79.3%. The introduction of embeddings maps improves the overall performances by 3% and the generalization across time and newspapers by 8% and 12%, respectively. This shows a strong potential to consider the semantic aspect in the segmentation of newspapers and to use textual features to improve generalization.
2019-06-21
Advisor(s): M. Ehrmann; S. Ares Oliveira; S. Clematide
Beyond Keyword Search: Semantic Indexing and Exploration of Large Collections of Historical Newspapers
For long held on library and archive shelving, historical newspapers are currently undergoing mass digitization and millions of facsimiles, along with their machine-readable content acquired via Optical Character Recognition, are becoming accessible via a variety of online portals. If this represents a major step forward in terms of preservation of and access to documents, much remains to be done in order to provide an extensive and sophisticated access to the content of these digital resources. We believe that the promise of newspaper digitization lies in their semantic indexation, closely tied with the development of co-designed interfaces that accommodate text analysis research tools and their usage by humanities scholars. How to go beyond keyword search? How to explore complex and vast amounts of data? Based on the on-going project ‘impresso – Media Monitoring of the Past’, in this talk I will present our interdisciplinary approach and share hands-on experience in going from facsimiles to enhanced search and visualization capacities supporting historical research.
Digital Humanitites in the Nordic Countries, Copenhagen, Denmark, March 2019.
Index-Driven Digitization and Indexation of Historical Archives
The promise of digitization of historical archives lies in their indexation at the level of contents. Unfortunately, this kind of indexation does not scale, if done manually. In this article we present a method to bootstrap the deployment of a content-based information system for digitized historical archives, relying on historical indexing tools. Commonly prepared to search within homogeneous records when the archive was still current, such indexes were as widespread as they were disconnected, that is to say situated in the very records they were meant to index. We first present a conceptual model to describe and manipulate historical indexing tools. We then introduce a methodological framework for their use in order to guide digitization campaigns and index digitized historical records. Finally, we exemplify the approach with a case study on the indexation system of the X Savi alle Decime in Rialto, a Venetian magistracy in charge for the exaction—and related record keeping—of a tax on real estate in early modern Venice.
Frontiers in Digital Humanities
2019-03-11
Vol. 6 , num. 1-16, p. 1-16.DOI : 10.3389/fdigh.2019.00004
Historical Newspaper User Interfaces: A Review
After decades of large-scale digitization, many historical newspaper collections are just one click away via online portals developed and supported by various public or private stakeholders. Initially offering access to full text search and facsimiles visualization only, historic newspaper user interfaces are increasingly integrating advanced exploration features based on the application of text mining tools to digitized sources. As gateways to enriched material, such interfaces are however not neutral and play a fundamental role in how users perceive historical sources, understand potential biases of upstream processes and benefit from the opportunities of datafication. What features can be found in current interfaces, and to what degree do interfaces adopt novel technologies? This paper presents a survey of interfaces for digitized historical newspapers with the aim of mapping the current state of the art and identifying recent trends with regard to content presentation, enrichment and user interaction. We devised 6 interface assessment criteria and reviewed twenty-four interfaces based on ca. 140 predefined features.
[Proceedings of the 85th IFLA General Conference and Assembly]
2019-09-02
85th IFLA General Conference and Assembly, Athens, Greece, 24-30 August 2019.p. 1-24
DOI : 10.5281/zenodo.3404155
Self-Recognition in Data Visualization: How Individuals See Themselves in Visual Representations
This article explores how readers recognize their personal identities represented through data visualizations. The recognition is investigated starting from three definitions captured by the philosopher Paul Ricoeur: the identification with the visualization, the recognition of someone in the visualization, and the mutual recognition that happens between readers. Whereas these notions were initially applied to study the role of the book reader, two further concepts complete the shift to data visualization: the digital identity stays for the present-day passport of human actions and the promise is the intimate reflection that projects readers towards their own future. This article reflects on the delicate meaning of digital identity and the way of representing it according to this structure: From Personal Identity to Media is a historical introduction to self-recognition, Data Visualization for Representing Identities moves the focus to visual representation, and The Course of Recognition breaks the self-recognition in through the five concepts above just before the Conclusion.
EspacesTemps.net
2019-08-08
DOI : 10.26151/espacestemps.net-wztp-cc46
Named Entity Processing for Historical Texts
Recognition and identification of real-world entities is at the core of virtually any text mining application. As a matter of fact, referential units such as names of persons, locations and organizations underlie the semantics of texts and guide their interpretation. Around since the seminal Message Understanding Conference (MUC) evaluation cycle in the 1990s, named entity-related tasks have undergone major evolutions until now, from entity recognition and classification to entity disambiguation and linking. Recently, NE processing has been called upon to contribute to the domain of digital humanities, where massive digitization of historical documents is producing huge amounts of texts. De facto, NE processing tools are increasingly being used in the context of historical documents. Research activities in this domain target texts of different nature (e.g., publications by cultural institutions, state-related documents, genealogical data, historical newspapers) and different tasks (NE recognition and classification, entity linking, or both). Experiments involve different time periods (from 16th to 20th c.), focus on different domains, and use different typologies. This great variety demonstrates how many and varied the needs – and the challenges – are, but makes performance comparison difficult, not to say impossible. The objective of this tutorial is to provide the participants with essential knowledge with respect to a) NE processing in general and in DH, and b) how to apply NE recognition approaches.
2019-07-17
Spherical Network Visualizations
Data visualization is a recent domain that founds his roots in the eighties, but its history dates back to more ancient times in other representations such as diagrams, drawings, or maps. In particular, data visualization is hard to take advantage of the heritage offered by cartography, a discipline with established theoretical and mathematical theories. Over more than two thousand years, cartography stimulated a discussion between critical thinking and spatial projections, with a keen interest for orientation and decision-making. This article picks up the cartographic technique of globe projection to apply it to network visualization. If the primary interest of globe projection in cartography is the flattened representation of the earth, in data visualization the focus is moved to the space continuity. In world maps usually the left and right sides are connected, the gaze can follow a trajectory that goes on the opposite side; while in data visualization the space of drawing is framed in all directions. Network visualizations rely on a relational logic applied on a limited flatland. We guess that drawing networks in a non-continuous space is a habit that can be changed. The hypothesis that drawing networks are less reductive on a spherical surface is supported by an example of travel distances between cities, which is mapped in two and three dimensions. Lastly, we argue that adopting a spherical projection eliminates the bias given by centrality in favor of spatial measure based on density.
Challenges in Design: Imagination, Aesthetics, and New Technology, Porto, Portugal, 26 June 2019.
The Daily Design of the Quantified Self
This article argues how the digital traces that are collected day by day can be used to reshape and improve our personal self. If in the past, we were used to collecting data through diaries, today the task of producing inscriptions is delegated to technological devices such as mobile phones. The article, then, discusses how technology can shape athletes using the Sky Team as an example of personal and collective design.
Swiss Informatics Digital Magazine (SIDM)
2019
DOI : 10.5281/zenodo.3463586
Frederic Kaplan Isabella di Lenardo
Apollo-The International Art Magazine
2019-01-01
Vol. 189 , num. 671, p. 21-21.Traduire les données en images
Journée d’étude : Imagination, imaginaire et images des (big) data, Université de Lille, France, 24 January 2019.
Traduire les données en images
Séminaire d’écritures numériques et éditorialisation, CNAM Paris et Université de Montreal, January 17th 2019.
Translating Data into Images
Séminaire du médialab, Sciences Po, Paris, France, January 15th 2019.
2018
Observation of Medium-Induced Modifications of Jet Fragmentation in Pb-Pb Collisions at $\sqrt{s_{NN}}=$ 5.02 TeV Using Isolated Photon-Tagged Jets
Measurements of fragmentation functions for jets associated with an isolated photon are presented for the first time in pp and Pb-Pb collisions. The analysis uses data collected with the CMS detector at the CERN LHC at a nucleon-nucleon center-of-mass energy of 5.02 TeV. Fragmentation functions are obtained for jets with pTjet>30 GeV/c in events containing an isolated photon with pTγ>60 GeV/c, using charged tracks with transverse momentum pTtrk>1 GeV/c in a cone around the jet axis. The association with an isolated photon constrains the initial pT and azimuthal angle of the parton whose shower produced the jet. For central Pb-Pb collisions, modifications of the jet fragmentation functions are observed when compared to those measured in pp collisions, while no significant differences are found in the 50% most peripheral collisions. Jets in central Pb-Pb events show an excess (depletion) of low (high) pT particles, with a transition around 3 GeV/c. This measurement shows for the first time the in-medium shower modifications of partons (quark dominated) with well-defined initial kinematics. It constitutes a new well-controlled reference for testing theoretical models of the parton passage through the quark-gluon plasma.
Physical Review Letters
2018-12-15
Vol. 121 , num. 24, p. 242301.DOI : 10.1103/PhysRevLett.121.242301
Measurements of Higgs boson properties in the diphoton decay channel in proton-proton collisions at $\sqrt{s} =$ 13 TeV
Measurements of Higgs boson properties in the H → γγ decay channel are reported. The analysis is based on data collected by the CMS experiment in proton-proton collisions at $ \sqrt{s}=13 $ TeV during the 2016 LHC running period, corresponding to an integrated luminosity of 35.9 fb$^{−1}$. Allowing the Higgs mass to float, the measurement yields a signal strength relative to the standard model prediction of 1.18$_{− 0.14}^{+ 0.17}$ = 1.18$_{− 0.11}^{+ 0.12}$ (stat)$_{− 0.07}^{+ 0.09}$ (syst)$_{− 0.06}^{+ 0.07}$ (theo), which is largely insensitive to the exact Higgs mass around 125 GeV. Signal strengths associated with the different Higgs boson production mechanisms, couplings to bosons and fermions, and effective couplings to photons and gluons are also measured.
Journal of High Energy Physics
2018-11-29
Vol. 2018 , num. 11, p. 185.DOI : 10.1007/JHEP11(2018)185
Measurement of differential cross sections for Z boson production in association with jets in proton-proton collisions at $\sqrt{s} =$ 13 TeV
The production of a ${\text {Z}}$ boson, decaying to two charged leptons, in association with jets in proton-proton collisions at a centre-of-mass energy of 13 $\,\text {TeV}$ is measured. Data recorded with the CMS detector at the LHC are used that correspond to an integrated luminosity of 2.19 $\,\text {fb}^\text {-1}$ . The cross section is measured as a function of the jet multiplicity and its dependence on the transverse momentum of the ${\text {Z}}$ boson, the jet kinematic variables (transverse momentum and rapidity), the scalar sum of the jet momenta, which quantifies the hadronic activity, and the balance in transverse momentum between the reconstructed jet recoil and the ${\text {Z}}$ boson. The measurements are compared with predictions from four different calculations. The first two merge matrix elements with different parton multiplicities in the final state and parton showering, one of which includes one-loop corrections. The third is a fixed-order calculation with next-to-next-to-leading order accuracy for the process with a ${\text {Z}}$ boson and one parton in the final state. The fourth combines the fully differential next-to-next-to-leading order calculation of the process with no parton in the final state with next-to-next-to-leading logarithm resummation and parton showering.
The European Physical Journal C
2018-11-22
Vol. 78 , num. 11, p. 965.DOI : 10.1140/epjc/s10052-018-6373-0
Measurement of the top quark mass with lepton+jets final states using $\mathrm {p}$ $\mathrm {p}$ collisions at $\sqrt{s}=13\,\text {TeV} $
The mass of the top quark is measured using a sample of ${{\text {t}}\overline{{\text {t}}}}$ events collected by the CMS detector using proton-proton collisions at $\sqrt{s}=13$ $\,\text {TeV}$ at the CERN LHC. Events are selected with one isolated muon or electron and at least four jets from data corresponding to an integrated luminosity of 35.9 $\,\text {fb}^{-1}$ . For each event the mass is reconstructed from a kinematic fit of the decay products to a ${{\text {t}}\overline{{\text {t}}}}$ hypothesis. Using the ideogram method, the top quark mass is determined simultaneously with an overall jet energy scale factor (JSF), constrained by the mass of the W boson in ${\text {q}} \overline{{\text {q}}} ^\prime $ decays. The measurement is calibrated on samples simulated at next-to-leading order matched to a leading-order parton shower. The top quark mass is found to be $172.25 \pm 0.08\,\text {(stat+JSF)} \pm 0.62\,\text {(syst)} \,\text {GeV} $ . The dependence of this result on the kinematic properties of the event is studied and compared to predictions of different models of ${{\text {t}}\overline{{\text {t}}}}$ production, and no indications of a bias in the measurements are observed.
The European Physical Journal C
2018-11-02
Vol. 78 , num. 11, p. 891.DOI : 10.1140/epjc/s10052-018-6332-9
Measurement of the production cross section for single top quarks in association with W bosons in proton-proton collisions at $ \sqrt{s}=13 $ TeV
A measurement is presented of the associated production of a single top quark and a W boson in proton-proton collisions at $ \sqrt{s}=13 $ TeV by the CMS Collaboration at the CERN LHC. The data collected corresponds to an integrated luminosity of 35.9 fb$^{−1}$. The measurement is performed using events with one electron and one muon in the final state along with at least one jet originated from a bottom quark. A multivariate discriminant, exploiting the kinematic properties of the events, is used to separate the signal from the dominant $ \mathrm{t}\overline{\mathrm{t}} $ background. The measured cross section of 63.1 ± 1.8(stat) ± 6.4(syst) ± 2.1 (lumi) pb is in agreement with the standard model expectation.
Journal of High Energy Physics
2018-10-18
Vol. 2018 , num. 10, p. 117.DOI : 10.1007/JHEP10(2018)117
Search for new physics in dijet angular distributions using proton–proton collisions at $\sqrt{s}=$ 13 TeV and constraints on dark matter and other models
A search is presented for physics beyond the standard model, based on measurements of dijet angular distributions in proton–proton collisions at $\sqrt{s}=13\hbox {TeV}$ . The data collected with the CMS detector at the LHC correspond to an integrated luminosity of 35.9 $\,\text {fb}^{-1}$ . The observed distributions, corrected to particle level, are found to be in agreement with predictions from perturbative quantum chromodynamics that include electroweak corrections. Constraints are placed on models containing quark contact interactions, extra spatial dimensions, quantum black holes, or dark matter, using the detector-level distributions. In a benchmark model where only left-handed quarks participate, contact interactions are excluded at the 95% confidence level up to a scale of 12.8 or 17.5TeV, for destructive or constructive interference, respectively. The most stringent lower limits to date are set on the ultraviolet cutoff in the Arkani–Hamed–Dimopoulos–Dvali model of extra dimensions. In the Giudice–Rattazzi–Wells convention, the cutoff scale is excluded up to 10.1TeV. The production of quantum black holes is excluded for masses below 5.9 and 8.2TeV, depending on the model. For the first time, lower limits between 2.0 and 4.6TeVare set on the mass of a dark matter mediator for (axial-)vector mediators, for the universal quark coupling $g_{\mathrm {\mathrm {q}}} =1.0$ .
The European Physical Journal C
2018-09-28
Vol. 78 , num. 9, p. 789.DOI : 10.1140/epjc/s10052-018-6242-x
Search for a heavy resonance decaying into a Z boson and a Z or W boson in 2ℓ2q final states at $ \sqrt{s}=13 $ TeV
A search has been performed for heavy resonances decaying to ZZ or ZW in 2ℓ2q final states, with two charged leptons (ℓ = e, μ) produced by the decay of a Z boson, and two quarks produced by the decay of a W or Z boson. The analysis is sensitive to resonances with masses in the range from 400 to 4500 GeV. Two categories are defined based on the merged or resolved reconstruction of the hadronically decaying vector boson, optimized for high- and low-mass resonances, respectively. The search is based on data collected during 2016 by the CMS experiment at the LHC in proton-proton collisions with a center-of-mass energy of $ \sqrt{s}=13 $ TeV, corresponding to an integrated luminosity of 35.9 fb$^{−1}$. No excess is observed in the data above the standard model background expectation. Upper limits on the production cross section of heavy, narrow spin-1 and spin-2 resonances are derived as a function of the resonance mass, and exclusion limits on the production of W$^{′}$ bosons and bulk graviton particles are calculated in the framework of the heavy vector triplet model and warped extra dimensions, respectively.
Journal of High Energy Physics
2018-09-18
Vol. 2018 , num. 9, p. 101.DOI : 10.1007/JHEP09(2018)101