Large-scale digitization projects
Digitization is much more than scanning. The DHLAB develops new methods and processes to conduct large-scale digitization projects. How to digitize large collections of historic documents is still an unsolved issue that highly depends on specific strategic choices. The DHLAB, in collaboration with other EPFL’s research labs, is working on innovating processes for archives selection, quality control, standardized metadata and optical character recognition techniques for ancient documents. The DHLAB also conducts research on crowdsourcing approaches for such large-scale project.
- Watch: “How to build a time machine” : Frederic Kaplan’s TED presentation about a large-scaled digitization project focused on the archives of Venice.
Big Data for the humanities
The DHLAB works specifically on new approaches for storing large amount of textual data using sustainable and energy efficient methods adapted to long-term conservation plans. These methods include smart lossless compression techniques and biologically inspired “endosemantic” formats (formats that includes the definition of their own semantic). We also investigate efficient storage strategies beyond standard hard disks based systems.
The glasses that digitize what you read
We also want to make digitization much simpler than it is now. The DH Lab works on a low-cost solution targeted at humanities scholars. We are working on a pair of glasses including a high definition camera that scan the page you read, combining dynamically a large set of views of the same page to recreate afterwards a good quality digital document. With such glasses, any page you read can be digitally stored on your personal archive. Using OCR techniques, we can then index the content of this archive to turn it into an efficient personal memory system.