DocumentCode :
3095713
Title :
A document comparison approach using hybrid keyword and structured full text vocabulary searches
Author :
Boonsuk, Kudachamai ; Sophatsathit, Peraphon
Author_Institution :
Technopreneurship & Innovation Manage. Program, Chulalongkorn Univ., Bangkok, Thailand
Volume :
1
fYear :
2011
fDate :
11-13 March 2011
Firstpage :
252
Lastpage :
257
Abstract :
This paper proposes a systematic full text search on document using a combined keyword and structural similarity of documents under consideration. The approach operates in two steps. The first step uses a set of designated keywords to acquire potential desired documents by means of an open source tool. The second step builds a suffix tree of frequently used vocabulary to retrieve the most similar documents from the acquired documents. In so doing, variations on contextual matching of full text search can be mitigated, wherein the resulting performance turns out to be quite acceptable. The ultimate goal is to arrive at a platform independent full text search technique that can be realized. The benefits for this scheme are two folds. On the one hand, relevant document can be retrieved as close to the desired document as possible. On the other hand, suspect plagiarism can be identified to some extent, which is dependent on the effectiveness of the proposed approach with plenty of rooms for future improvement. The proposed work will eventually be put to real use for database retrieval in a small business enterprise.
Keywords :
query formulation; relevance feedback; text analysis; word processing; contextual matching; document comparison approach; keyword similarity; open source tool; relevant document retrieval; structured full text vocabulary search; suffix tree; systematic full text search; Keyword search; Libraries; Plagiarism; Search engines; Vegetation; Vocabulary; Weight measurement; contextual matching; full text search; plagiarism; structural similarity; suffix tre;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Research and Development (ICCRD), 2011 3rd International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-839-6
Type :
conf
DOI :
10.1109/ICCRD.2011.5764014
Filename :
5764014
Link To Document :
بازگشت