LinkedBooks Citation Dataset

A dataset of citations extracted from monographs about the history of Venice, created in the context of the LinkedBooks project.

More information


Contact: Giovanni Colavizza, Matteo Romanello

Related publications:

Annotated References in the Historiography on Venice: 19th–21st centuries

G. Colavizza; M. Romanello 

We publish a dataset containing more than 40’000 manually annotated references from a broad corpus of books and journal articles on the history of Venice. References were considered from both reference lists and footnotes, include primary and secondary sources, in full or abbreviated form. The dataset comprises references from publications from the 19th to the 21st century. References were collected from a newly digitized corpus and manually annotated in all their constituent parts. The dataset is stored on a GitHub repository, persisted in Zenodo, and it is accompanied with code to train parsers in order to extract references from other publications. Two trained Conditional Random Fields models are provided along with their evaluation, in order to act as a baseline for a parsing shared task. No comparable public dataset exists to support the task of reference parsing in the humanities. The dataset is of interest to all working on the domain of reference parsing and citation extraction in the humanities.

Journal of Open Humanities Data. 2017. Vol. 3, p. 2. DOI : 10.5334/johd.9.