Title :
Computer aided cleaning of large databases for character recognition
Author :
N. Matic;I. Guyon;L. Bottou;J. Denker;V. Vapnik
Author_Institution :
AT&T Bell Lab., Holmdel, NJ, USA
fDate :
6/14/1905 12:00:00 AM
Abstract :
A method for computer-aided cleaning of undesirable patterns in large training databases has been developed. The method uses the trainable classifier itself, to point out patterns that are suspicious, and should be checked by the human supervisor. While suspicious patterns that are meaningless or mislabeled are considered garbage, and removed from the database, the remaining patterns, like ambiguous or atypical, represent valid patterns that are hard to learn and should be kept in the database. By using the method of pattern cleaning, combined with an emphasizing scheme applied on the patterns that are hard to learn, the error rate on the test set has been reduced by half, in the case of the database of handwritten lowercase characters entered on a touch terminal. The classifier is based on a time delay neural network (TDNN).
Keywords :
"Cleaning","Databases","Character recognition","Pattern recognition","Humans","Error analysis","Testing","Neural networks","Degradation","Delay effects"
Conference_Titel :
Pattern Recognition, 1992. Vol.II. Conference B: Pattern Recognition Methodology and Systems, Proceedings., 11th IAPR International Conference on
Print_ISBN :
0-8186-2915-0
DOI :
10.1109/ICPR.1992.201784