DocumentCode :
3338671
Title :
Recursive text segmentation for Indonesian Automated Document Reader for people with visual impairment
Author :
Tjahja, T.V. ; Nugroho, Anto Satriyo ; Purnama, James ; Azis, Nur Aziza ; Maulidiyatul Hikmah, Rose ; Riandi, Oskar ; Prasetyo, B.
Author_Institution :
Fac. of Inf. Technol., Swiss German Univ., Tangerang, Indonesia
fYear :
2011
fDate :
17-19 July 2011
Firstpage :
1
Lastpage :
6
Abstract :
This research is conducted to accommodate the needs of visually impaired people through an intelligent system, which reads textual information on papers and produces corresponding voice. Indonesian Automated Document Reader (I-ADR) is operated via a voice-based user interface to scan a document page. Textual information from the scanned page is then extracted using Optical Character Recognition (OCR) techniques. A user can then choose to have the system read the whole page, or they can opt to listen to a summary of the information in page. SIDoBI (Sistem Ikhtisar Dokumen untuk Bahasa Indonesia) is integrated into the system to provide summarization feature. The result of either the whole-page reading or summarization is converted to speech through a text-to-speech synthesizer. This whole system is developed under the Free Open Source Software policy and will be distributed openly to all users in need without any cost. This paper is focused on the text segmentation algorithm implemented in I-ADR to extract text from documents with complex layout. We implemented I-ADR text segmentation module using Enhanced CRLA and propose an improved algorithm for text extraction. Evaluation of the proposed system with various page layouts showed promising results.
Keywords :
handicapped aids; human computer interaction; natural language processing; optical character recognition; public domain software; speech synthesis; speech-based user interfaces; text analysis; I-ADR text segmentation module; Indonesian automated document reader; OCR; SIDoBI; Sistem Ikhtisar Dokumen untuk Bahasa Indonesia; document page; enhanced CRLA; free open source software policy; intelligent system; optical character recognition techniques; recursive text segmentation; summarization feature; text extraction; text-to-speech synthesizer; visually impaired people; voice-based user interface; Character recognition; Databases; Histograms; Image segmentation; Optical character recognition software; Speech; Synthesizers; OCR; text segmentation; text summarization; text-to-speech synthesizer; visual impairment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical Engineering and Informatics (ICEEI), 2011 International Conference on
Conference_Location :
Bandung
ISSN :
2155-6822
Print_ISBN :
978-1-4577-0753-7
Type :
conf
DOI :
10.1109/ICEEI.2011.6021764
Filename :
6021764
Link To Document :
بازگشت