شماره ركورد كنفرانس :
2139
عنوان مقاله :
Farsi Machine-printed Subwords Recognition Using Contour-based Fourier Descriptors
عنوان به زبان ديگر :
Farsi Machine-printed Subwords Recognition Using Contour-based Fourier Descriptors
پديدآورندگان :
Bahar Parnia نويسنده , Mozaffari Saeed نويسنده
كليدواژه :
Farsi/Arabic word recognition , machin-printed documents , Fourier shape descriptors , Large dataset
عنوان كنفرانس :
نخستين كنفرانس بين المللي پردازش خط و زبان فارسي
چكيده لاتين :
This paper presents a fast and simple method for Farsi/Arabic subwords recognition in a large lexicon. By omitting dots and complementary parts of machine-printed characters, a dataset including 9445 Farsi/Arabic subwords written by a single font and single size was obtained. This dataset not only reduces the number of subwords, but makes it suitable for both Farsi/Arabic languages. After normalizing boundary points of each subword, Fourier descriptor features are extracted. Experimental results on 30 plain text shows accuracy of 82.1% on subword level. Considering this large and comprehensive dataset, the obtained results are still promising which can be enhanced in the future by the use of Farsi/Arabic language grammar for connecting subwords.
شماره مدرك كنفرانس :
4474716