DocumentCode :
1609917
Title :
Word classification in bilingual printed documents
Author :
Haboubi, Sofiene ; Maddouri, Samia ; Amiri, Hamid
Author_Institution :
Image & Inf. Techno logies Lab., Nat. Eng. Sch. of Tunis, Tunis, Tunisia
fYear :
2012
Firstpage :
502
Lastpage :
506
Abstract :
In this paper we propose a method of identifying Arabic words from Arabic and Latin scripts in printed documents. This method is based on a statistical and geometric analysis to separate between words of a printed document. Structural features are used to describe the words extracted in previous step. Among the features used: the jambs, the diacritical points, the connected components, the hamps... From these characteristics, we construct our vector that allows the description. Functions of neural networks are used to classify the different words extracted. Classification is according to two classes Arabic or Latin. We present the found results of classification step, with a discussion on possible improvements.
Keywords :
geometry; natural language processing; neural nets; pattern classification; statistical analysis; text analysis; Arabic words; Latin scripts; bilingual printed documents; connected components; geometric analysis; neural networks; printed documents; statistical analysis; structural features; word classification; words extraction; Character recognition; Feature extraction; Gabor filters; Optical character recognition software; Text analysis; Writing; Language identification; structural features; word extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2012 6th International Conference on
Conference_Location :
Sousse
Print_ISBN :
978-1-4673-1657-6
Type :
conf
DOI :
10.1109/SETIT.2012.6481963
Filename :
6481963
Link To Document :
بازگشت