Title :
Large-scale documents reduction based on domain ontology and E2LSH
Author :
Hongmei Li ; Wenning Hao ; Gang Chen ; Xianglin Liao
Author_Institution :
Inst. of Command Inf. Syst., PLA Univ. of Sci.&Technol., Nanjing, China
Abstract :
Large-scale documents reduction plays a critical role in document management organizing and document mining, etc, and the research is concentrated on two special aspects: the construction of document representation model and index optimization of feature space for similarity search. While the semantic gap and curse of dimensionality are still two open and tough issues. Motivated by this, in the paper, we propose a novel method based on domain ontology and E2LSH (Exact Euclidean Locality-Sensitive Hashing). Firstly, we build an improved model based on domain ontology, called Semantic Vector Space Model (SVSM), to reveal the latent semantic relationships among document feature terms besides syntax information. The SVSM shortens the semantic gap of traditional VSM and reduces feature dimension. Then in view of the complexity of searching space for the similarity computation among documents pairs, we introduce E2LSH to build indexes of feature space, optimizing the searching space and overcoming the curse of dimensionality. Experimental validation has been conducted using realistic documents, and experimental results indicate the rationality and effectiveness of our method.
Keywords :
data structures; document handling; ontologies (artificial intelligence); E2LSH; SVSM; curse of dimensionality; document management organization; document mining; document representation model; domain ontology; exact Euclidean locality-sensitive hashing; feature dimension reduction; index optimization; large-scale documents reduction; latent semantic relationships; semantic gap; semantic vector space model; similarity search; syntax information; Indexes; E2LSH; VSM; documents reduction; domain ontology; semantic relationship; similarity computation;
Conference_Titel :
Networking, Sensing and Control (ICNSC), 2014 IEEE 11th International Conference on
Conference_Location :
Miami, FL
DOI :
10.1109/ICNSC.2014.6819594