Title :
Recursive text segmentation for Indonesian Automated Document Reader for people with visual impairment
Author :
Tjahja, T.V. ; Nugroho, Anto Satriyo ; Purnama, James ; Azis, Nur Aziza ; Maulidiyatul Hikmah, Rose ; Riandi, Oskar ; Prasetyo, B.
Author_Institution :
Fac. of Inf. Technol., Swiss German Univ., Tangerang, Indonesia
Abstract :
This research is conducted to accommodate the needs of visually impaired people through an intelligent system, which reads textual information on papers and produces corresponding voice. Indonesian Automated Document Reader (I-ADR) is operated via a voice-based user interface to scan a document page. Textual information from the scanned page is then extracted using Optical Character Recognition (OCR) techniques. A user can then choose to have the system read the whole page, or they can opt to listen to a summary of the information in page. SIDoBI (Sistem Ikhtisar Dokumen untuk Bahasa Indonesia) is integrated into the system to provide summarization feature. The result of either the whole-page reading or summarization is converted to speech through a text-to-speech synthesizer. This whole system is developed under the Free Open Source Software policy and will be distributed openly to all users in need without any cost. This paper is focused on the text segmentation algorithm implemented in I-ADR to extract text from documents with complex layout. We implemented I-ADR text segmentation module using Enhanced CRLA and propose an improved algorithm for text extraction. Evaluation of the proposed system with various page layouts showed promising results.
Keywords :
handicapped aids; human computer interaction; natural language processing; optical character recognition; public domain software; speech synthesis; speech-based user interfaces; text analysis; I-ADR text segmentation module; Indonesian automated document reader; OCR; SIDoBI; Sistem Ikhtisar Dokumen untuk Bahasa Indonesia; document page; enhanced CRLA; free open source software policy; intelligent system; optical character recognition techniques; recursive text segmentation; summarization feature; text extraction; text-to-speech synthesizer; visually impaired people; voice-based user interface; Character recognition; Databases; Histograms; Image segmentation; Optical character recognition software; Speech; Synthesizers; OCR; text segmentation; text summarization; text-to-speech synthesizer; visual impairment;
Conference_Titel :
Electrical Engineering and Informatics (ICEEI), 2011 International Conference on
Conference_Location :
Bandung
Print_ISBN :
978-1-4577-0753-7
DOI :
10.1109/ICEEI.2011.6021764