DocumentCode :
2631480
Title :
European language determination from image
Author :
Nakayama, Takehiro ; Spitz, A. Lawrence
Author_Institution :
Fuji Xerox Palo Alto Lab., CA, USA
fYear :
1993
fDate :
20-22 Oct 1993
Firstpage :
159
Lastpage :
162
Abstract :
The authors have developed a technique for determining the language from an image of text. This work is restricted to a small subset of European languages, but uses techniques which should be applicable across many more languages. The method first makes generalizations about images of characters, then performs gross classification of the isolated characters and agglomerates these class identities into spatially isolated (word) tokens. Analysis of corpora in English, French and German yields training data for a language classifier designed to codify the spatial relationships of the connected components which compose the letter-forms. Linear discriminant analysis provides classification criteria on which the test data are evaluated. The resulting process takes in images of text and produces a language classification based on image representations and generalizations about relative token shape frequency in the target languages
Keywords :
character recognition; image classification; linguistics; natural languages; English; European languages; French; German; class identities; classification criteria; corpora; gross classification; image representations; isolated characters; language classifier; language determination; linear discriminant analysis; spatial relationships; spatially isolated tokens; token shape frequency; training data; word tokens; Character recognition; Frequency; Image representation; Laboratories; Linear discriminant analysis; Natural languages; Optical character recognition software; Shape; Testing; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
Type :
conf
DOI :
10.1109/ICDAR.1993.395759
Filename :
395759
Link To Document :
بازگشت