Title :
CBDIR: Fast and effective content based document Information Retrieval system
Author :
Moon Soo Cha ; So Yeon Kim ; Jae Hee Ha ; Min-June Lee ; Young-June Choi ; Kyung-Ah Sohn
Author_Institution :
Dept. of Inf. & Comput. Eng., Ajou Univ., Suwon, South Korea
fDate :
June 28 2015-July 1 2015
Abstract :
The continuing growth of information overflow has made it hard to obtain valuable information on the web. In this trend, the need for effective Information Retrieval (IR) technique has been increased. Although document data contain much more abundant information, users can retrieve necessary information only from the title and description in conventional web services. In order to meet the demands for fast and accurate retrieval of valuable information, we propose a fast and effective content-based document information retrieval system that retrieves the information from the actual content of a document. The proposed method is based on a topic model of Latent Dirichlet Allocation that is used to extract major keywords for a given document. The main contributions of our system are the increased flexibility, effectiveness, and fast retrieval of information. Our system can easily communicate with existing web service through the standard JSON format. In addition, we increase the speed of information retrieval by using NoSQL based database system with inverted indexing and B-tree based indexing. We validate the performance of our system on real data collected from the SlideShare service. The proposed system shows better retrieval performance over the existing IR system.
Keywords :
content-based retrieval; document handling; indexing; information retrieval systems; relational databases; B-tree based indexing; CBDIR system; IR technique; JSON format; NoSQL based database system; SlideShare service; Web services; content based document information retrieval system; information overflow; inverted indexing; keyword extraction; latent Dirichlet allocation; Indexing; Information retrieval; Internet of things; Printing; Software; Three-dimensional displays; B-tree Indexing; CBIR; Information Retrieval; Inverted Indexing; LDA; NoSQL; Topic modeling;
Conference_Titel :
Computer and Information Science (ICIS), 2015 IEEE/ACIS 14th International Conference on
Conference_Location :
Las Vegas, NV
DOI :
10.1109/ICIS.2015.7166594