Abstract :
Moving vehicle detection and classification using multimodal data is a challenging task in data collection, audio-visual alignment, and feature selection, and effective vehicle classification in uncontrolled environments. In this work, we first present a systematic way to align the multimodal data based the multimodal temporal panorama generation. Then various types of features are extracted to represent diverse and multimodal information. Those include global geometric features (aspect ratios, profiles), local structure features (HOGs), various audio features in both spectral and perceptual representations. A flexible sequential forward selection algorithm with multi-branch searching is used to select a set of important features at different levels of feature combinations. Finally, using the same datasets for two different classification tasks, we show that the roles of audio and visual features are task-specific. Furthermore, in both cases, the combination of some of the features with multimodal and complementary information can improve the accuracy than using the individual features only. Therefore finer and more accurate classification can be achieved by two different levels of integration: feature level and the decision level.
Keywords :
audio signal processing; feature extraction; image classification; object detection; spectral analysis; traffic engineering computing; HOG; aspect ratios; audio features; audio-visual alignment; data collection; decision level; feature extraction; feature level; feature selection; flexible sequential forward selection algorithm; global geometric features; local structure features; moving vehicle classification; multibranch searching; multimodal audio-visual vehicle detection; multimodal temporal panorama generation; multitask audio-visual vehicle detection; perceptual representations; profiles; spectral representations; Accuracy; Feature extraction; Image reconstruction; Mel frequency cepstral coefficient; Vehicle detection; Vehicles; Visualization;