Author_Institution :
Nat. Lab. of Pattern Recognition(NLPR), Inst. of Autom., Beijing, China
Abstract :
For assessing touching character segmentation algorithms, we present a database of touching characters collected from the Chinese handwriting database CASIA-HWDB, called CASIA-HWDB-T. It includes 56,469 two-character or multiple-character touching strings, among which 1,818 strings have multiple-touching characters. We also partition the touching strings into 50,157 all-Chinese strings, 2,788 all-digit ones, 328 all-letter ones, and 3,196 mixed-character ones. All the strings are annotated with the character classes, locations of touching points, and auxiliary values like string height and average stroke width. And last, we measure the segmentation performance of three existing algorithms on this database for reference.
Keywords :
handwriting recognition; image segmentation; visual databases; CASIA-HWDB-T database; Chinese handwriting; multiple-touching character; string height; stroke width; touching character database; touching character segmentation algorithm; touching point location; touching string partition; Accuracy; Algorithm design and analysis; Character recognition; Databases; Handwriting recognition; Image segmentation; Text recognition;