DocumentCode :
3280839
Title :
CBDIR: Fast and effective content based document Information Retrieval system
Author :
Moon Soo Cha ; So Yeon Kim ; Jae Hee Ha ; Min-June Lee ; Young-June Choi ; Kyung-Ah Sohn
Author_Institution :
Dept. of Inf. & Comput. Eng., Ajou Univ., Suwon, South Korea
fYear :
2015
fDate :
June 28 2015-July 1 2015
Firstpage :
203
Lastpage :
208
Abstract :
The continuing growth of information overflow has made it hard to obtain valuable information on the web. In this trend, the need for effective Information Retrieval (IR) technique has been increased. Although document data contain much more abundant information, users can retrieve necessary information only from the title and description in conventional web services. In order to meet the demands for fast and accurate retrieval of valuable information, we propose a fast and effective content-based document information retrieval system that retrieves the information from the actual content of a document. The proposed method is based on a topic model of Latent Dirichlet Allocation that is used to extract major keywords for a given document. The main contributions of our system are the increased flexibility, effectiveness, and fast retrieval of information. Our system can easily communicate with existing web service through the standard JSON format. In addition, we increase the speed of information retrieval by using NoSQL based database system with inverted indexing and B-tree based indexing. We validate the performance of our system on real data collected from the SlideShare service. The proposed system shows better retrieval performance over the existing IR system.
Keywords :
content-based retrieval; document handling; indexing; information retrieval systems; relational databases; B-tree based indexing; CBDIR system; IR technique; JSON format; NoSQL based database system; SlideShare service; Web services; content based document information retrieval system; information overflow; inverted indexing; keyword extraction; latent Dirichlet allocation; Indexing; Information retrieval; Internet of things; Printing; Software; Three-dimensional displays; B-tree Indexing; CBIR; Information Retrieval; Inverted Indexing; LDA; NoSQL; Topic modeling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Science (ICIS), 2015 IEEE/ACIS 14th International Conference on
Conference_Location :
Las Vegas, NV
Type :
conf
DOI :
10.1109/ICIS.2015.7166594
Filename :
7166594
Link To Document :
بازگشت