Author_Institution :
Sch. of Comput. Sci., Harbin Inst. of Technol., Shenzhen, China
Abstract :
In this paper, a video vertical search engine is designed and implemented based on the theory of vertical search engine. Firstly, we introduce the vertical search engine and its research situation at home and abroad, analyze the principle of implementing the vertical search engine, and introduce the key technology used in this paper, such as subject information acquisition method, Chinese segmentation algorithm, and the search result re-sorting. We provide the video resource acquisition process and the video resources storage, and repeat video resources exclusion. Then, we analyze an information retrieval tool library, Lucene, which is with a advanced design and superior performance. Based on this library, a Chinese segmentation algorithm and a result sorting method are added. Unlike current other studies, a variable length matching strategy is taken for designing Chinese word with bidirectional matching method for disambiguation. Compared with the latest open source word segmentation algorithm, our segmentation algorithm designed in this paper outperforms better. With the video resources fetching from the internet and the Chinese word segmentation of VKAnalyzer extending from Lucene designed and implemented in the paper, we design related video re-sorting methods by different ways, such as length, times and comments, and implement the sorting method for search results according to users´ various requirements. The experiments shows that the recall rate of the search engine is 90% and the accuracy is 97%, as are satisfactory.
Keywords :
Internet; image matching; natural language processing; search engines; video retrieval; Chinese segmentation algorithm; Chinese word segmentation; Internet; Lucene; VKAnalyzer; bidirectional matching method; information retrieval tool library; one vertical video search engine; recall rate; repeat video resources exclusion; search result resorting; subject information acquisition method; variable length matching strategy; video resorting methods; video resource acquisition process; video resource fetching; video resources storage; Educational institutions; Indexing; Internet; Search engines; Sorting;
Conference_Titel :
Security, Pattern Analysis, and Cybernetics (SPAC), 2014 International Conference on