Title :
PH-SSBM: Phrase Semantic Similarity Based Model for Document Clustering
Author :
Gad, Walaa K. ; Kamel, Mohamed S.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON, Canada
fDate :
Nov. 30 2009-Dec. 1 2009
Abstract :
In this paper, a novel document representation model the phrases semantic similarity based model (PHSSBM), is proposed. This model combines phrases analysis as well as words analysis with the use of WordNet as background knowledge to explore better ways of documents representation for clustering. The PH-SSBM assigns semantic weights to both document words and phrases. The new weights reflect the semantic relatedness between documents terms and capture the semantic information in the documents. The PH-SSBM finds similarity between documents based on matching terms (phrases and words) and their semantic weights. Experimental results show that the phrases semantic similarity based model (PH-SSBM) in conjunction with WordNet has a promising performance improvement for text clustering.
Keywords :
text analysis; word processing; PH-SSBM; WordNet; document clustering; document representation; matching terms; phrases semantic similarity based model; text clustering; Clustering algorithms; Entropy; Fellows; Frequency; Knowledge acquisition; Ontologies; Performance evaluation; Speech; Testing; Text mining; Clustering; Phrases-based analysis; semantic similarity;
Conference_Titel :
Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3888-4
DOI :
10.1109/KAM.2009.191