Abstract :
The article addresses the issue of retrieving distinctive regions of interest or patterns (DROP) in video surveillance datasets. DROP may include logos, tattoos, color regions or any other distinctive features that appear recorded on video. These data come in particular with specific difficulties such as low image quality, multiple image perspectives, variable lighting conditions and lack of enough training samples. This task is a real need functionality as the challenges are derived from practice of police forces. We present our preliminary results on tackling such scenario from Scotland Yard, dealing with the constraints of a real world use case. The proposed method is based on two approaches: employment of a dense SIFT-based descriptor (Pyramidal Histogram of Visual Words), and use of image segmentation (Mean-Shift) with feature extraction on each segment computed. Tested on real data we achieve very promising results that we believe will contribute further to the ground development of advanced methods to be applied and tested in real forensics investigations.