Title :
Marlin: Taming the big streaming data in large scale video similarity search
Author :
Nan Zhu;Wenbo He;Yu Hua;Yixin Chen
Author_Institution :
McGill University, Montreal, Quebec, Canada
Abstract :
The extreme volume and staggeringly increasing rate inevitably produce unprecedented pressure on any large scale video sharing and hosting systems. Among the efforts to mitigate this pressure, content-based video similarity search is becoming more and more important with the exponential growth of the data size. Though various approaches have been proposed to address this problem, they are mainly focusing on the retrieval accuracy thus bringing video features with high complexity. Due to the complexity of the feature, these systems are based on the assumption that features representing videos have been obtained offline and stored in the database statically. However, the on-call efforts to move the feature extraction and similarity search from offline to online have been ignored in previous work. In this paper, we propose Marlin, a streaming data processing pipeline that efficiently extracts video features and retrieves video similarity information in a large scale video data system. We design a streaming feature extractor to handle the videos streaming into the system and establish the fined-grained resource allocation with a resource-aware data abstraction layer over streaming data to allocate computing resources among the videos with various resource demands. Besides that, we are pipelining the feature extraction and similarity search process with a distributed feature index, which supports real-time query and incremental index update. The experimental and the extensive real-world workload driven simulation results show that the proposed stream processing architecture achieves 25X speedup against the sequential feature extraction algorithm and 23X speedup against the sequential similarity search with a subsecond similarity query latency for a single request.
Keywords :
"Feature extraction","Streaming media","Indexes","Complexity theory","Distributed databases","Resource management","Servers"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363947