Title :
Improving Formula Analysis with Line and Mathematics Identification
Author :
Alkalai, Mohamed ; Baker, Josef B. ; Sorge, Volker ; Xiaoyan Lin
Author_Institution :
Sch. of Comput. Sci., Univ. of Birmingham, Birmingham, UK
Abstract :
The explosive growth of the internet and electronic publishing has led to a huge number of scientific documents being available to users, however, they are usually inaccessible to those with visual impairments and often only partially compatible with software and modern hardware such as tablets and e-readers. In this paper we revisit Maxtract, a tool for analysing and converting documents into accessible formats, and combine it with two advanced segmentation techniques, statistical line identification and machine learning formula identification. We show how these advanced techniques improve the quality of both Maxtract´s underlying document analysis and its output. We re-run and compare experimental results over a number of datasets, presenting a qualitative review of the improved output and drawing conclusions.
Keywords :
document image processing; electronic publishing; image segmentation; learning (artificial intelligence); statistical analysis; Internet; Maxtract document analysis; e-readers; electronic publishing; electronic readers; formula analysis; machine learning formula identification; mathematics identification; segmentation techniques; statistical line identification; tablets; Accuracy; Feature extraction; Histograms; Layout; Portable document format; Text recognition; Math formula recognition; formula identification; line segmentation;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.74