Ground-Truth Production in the Transcriptorium Project
2014pp. 237–241
Citations Over TimeTop 10% of 2014 papers
Basilis Gatos, Georgios Louloudis, Tim Causer, Kris Grint, Verónica Romero, Joan Andreu Sánchez, Alejandro H. Toselli, Enrique Vidal
Abstract
Tran Scriptorium is a 3-years project that aims to develop innovative, cost-effective solutions for the indexing, search and full transcription of historical handwritten document images, using Handwritten Text Recognition (HTR) technology. The production of ground-truth (GT) of a dataset of handwritten document images is among the first tasks. We address novel approaches for the faster production of this GT based on crowd-sourcing and on prior-knowledge methods. We also address here a novel low-cost semi-supervised procedure for obtaining pairs of correct line-level aligned detected/extracted text line images and text line transcripts, specially suitable for training models of the HTR technology employed in Tran Scriptorium.
Related Papers
- → Analysis in indexing: document and domain centered approaches(2004)100 cited
- → A Review on Indexing Techniques and its application in Multilingual Information Retrieval System(2021)3 cited
- → Indexing: Another way for authors to communicate(1982)1 cited
- The problems of Chinese Library Classification(CLC) in indexing of scientific and technical journals and the indexing principles(2008)
- A closer look on indexing and indexing parameters(2020)