Comparison of Data Set Bias in Object Recognition Benchmarks

Author

Model, Ian ; Shamir, Lior

Author_Institution

Dept. of Comput. Sci., Lawrence Technol. Univ., Southfield, MI, USA

Volume

3

fYear

2015

fDate

7/7/1905 12:00:00 AM

Firstpage

1953

Lastpage

1962

Abstract

Current research in the area of automatic visual object recognition heavily relies on testing the performance of new algorithms by using benchmark data sets. Such data sets can be based on standardized data sets collected systematically in a controlled environment (e.g., COIL-20), as well as benchmarks compiled by collecting images from various sources, normally via the World Wide Web (e.g., Caltech 101). Here, we test bias in benchmark data sets by separating a small area from each image such that the area is seemingly blank, and too small to allow manual recognition of the object. The method can be used to detect the existence of data set bias in a single-object recognition data set, and compare the bias to other data sets. The results show that all the tested data sets allowed classification accuracy higher than mere chance by using the small images, although the sub-images did not contain any visually interpretable information. That shows that the consistency of the images within the different classes of object recognition data sets can allow classifying the images even by algorithms that do not recognize objects. Among the tested data sets, PASCAL is the data set with the lowest observed bias, while data sets acquired in a controlled environment, such as COIL-20, COIL-100, and NEC Animals, are more vulnerable to bias, and can be classified by the sub-images with accuracy far higher than mere chance.

Keywords

Internet; image classification; object recognition; COIL-100; COIL-20; NEC Animals; PASCAL; World Wide Web; automatic visual object recognition; benchmark data sets; classification accuracy; controlled environment; data set bias; manual object recognition; object recognition benchmarks; single-object recognition data set; standardized data sets; Benchmark testing; Object recognition; Pattern recognition; Performance evaluation; Validation; Object recognition; benchmarks; computer vision; pattern recognition; performance evaluation; validation;

fLanguage

English

Journal_Title

Access, IEEE

Publisher

ieee

ISSN

2169-3536

Type

jour

DOI

10.1109/ACCESS.2015.2491921

Filename

7299607