DocumentCode :
1702410
Title :
Multimodal and Multi-task Audio-Visual Vehicle Detection and Classification
Author :
Wang, Tao ; Zhu, Zhigang
fYear :
2012
Firstpage :
440
Lastpage :
446
Abstract :
Moving vehicle detection and classification using multimodal data is a challenging task in data collection, audio-visual alignment, and feature selection, and effective vehicle classification in uncontrolled environments. In this work, we first present a systematic way to align the multimodal data based the multimodal temporal panorama generation. Then various types of features are extracted to represent diverse and multimodal information. Those include global geometric features (aspect ratios, profiles), local structure features (HOGs), various audio features in both spectral and perceptual representations. A flexible sequential forward selection algorithm with multi-branch searching is used to select a set of important features at different levels of feature combinations. Finally, using the same datasets for two different classification tasks, we show that the roles of audio and visual features are task-specific. Furthermore, in both cases, the combination of some of the features with multimodal and complementary information can improve the accuracy than using the individual features only. Therefore finer and more accurate classification can be achieved by two different levels of integration: feature level and the decision level.
Keywords :
audio signal processing; feature extraction; image classification; object detection; spectral analysis; traffic engineering computing; HOG; aspect ratios; audio features; audio-visual alignment; data collection; decision level; feature extraction; feature level; feature selection; flexible sequential forward selection algorithm; global geometric features; local structure features; moving vehicle classification; multibranch searching; multimodal audio-visual vehicle detection; multimodal temporal panorama generation; multitask audio-visual vehicle detection; perceptual representations; profiles; spectral representations; Accuracy; Feature extraction; Image reconstruction; Mel frequency cepstral coefficient; Vehicle detection; Vehicles; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Video and Signal-Based Surveillance (AVSS), 2012 IEEE Ninth International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4673-2499-1
Type :
conf
DOI :
10.1109/AVSS.2012.47
Filename :
6328054
Link To Document :
بازگشت