مرکز منطقه ای اطلاع رساني علوم و فناوري - Multimodal and Multi-task Audio-Visual Vehicle Detection and Classification

DocumentCode :

1702410

Title :

Multimodal and Multi-task Audio-Visual Vehicle Detection and Classification

Author :

Wang, Tao ; Zhu, Zhigang

fYear :

2012

Firstpage :

440

Lastpage :

446

Abstract :

Moving vehicle detection and classification using multimodal data is a challenging task in data collection, audio-visual alignment, and feature selection, and effective vehicle classification in uncontrolled environments. In this work, we first present a systematic way to align the multimodal data based the multimodal temporal panorama generation. Then various types of features are extracted to represent diverse and multimodal information. Those include global geometric features (aspect ratios, profiles), local structure features (HOGs), various audio features in both spectral and perceptual representations. A flexible sequential forward selection algorithm with multi-branch searching is used to select a set of important features at different levels of feature combinations. Finally, using the same datasets for two different classification tasks, we show that the roles of audio and visual features are task-specific. Furthermore, in both cases, the combination of some of the features with multimodal and complementary information can improve the accuracy than using the individual features only. Therefore finer and more accurate classification can be achieved by two different levels of integration: feature level and the decision level.

Keywords :

audio signal processing; feature extraction; image classification; object detection; spectral analysis; traffic engineering computing; HOG; aspect ratios; audio features; audio-visual alignment; data collection; decision level; feature extraction; feature level; feature selection; flexible sequential forward selection algorithm; global geometric features; local structure features; moving vehicle classification; multibranch searching; multimodal audio-visual vehicle detection; multimodal temporal panorama generation; multitask audio-visual vehicle detection; perceptual representations; profiles; spectral representations; Accuracy; Feature extraction; Image reconstruction; Mel frequency cepstral coefficient; Vehicle detection; Vehicles; Visualization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advanced Video and Signal-Based Surveillance (AVSS), 2012 IEEE Ninth International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4673-2499-1

Type :

conf

DOI :

10.1109/AVSS.2012.47

Filename :

6328054

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1702410