Title :
Collecting historical font metrics from Google Books
Author :
LiVolsi, R. ; Zanibbi, Richard ; Bigelow, C.
Abstract :
A system is presented for extracting key metrics from fonts used in historical documents. The system identifies important landmarks on a page, such as margins, paragraphs, and lines, and applies frequency analysis techniques to identify relevant sizes. The system was validated by comparing its measurements to the measurements of a human expert on randomly selected samples, and differed on average from the expert by less than 5% for x-height, body size, and line spacing metrics.
Keywords :
document image processing; history; Google books; frequency analysis techniques; historical documents; historical font metrics; line spacing metrics; randomly selected samples; Google; Humans; Image segmentation; Noise; Size measurement; Standards;
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
Print_ISBN :
978-1-4673-2216-4