DocumentCode :
108193
Title :
Extended Subtree: A New Similarity Function for Tree Structured Data
Author :
Shahbazi, Ali ; Miller, Jason
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Alberta, Edmonton, AB, Canada
Volume :
26
Issue :
4
fYear :
2014
fDate :
Apr-14
Firstpage :
864
Lastpage :
877
Abstract :
Although several distance or similarity functions for trees have been introduced, their performance is not always satisfactory in many applications, ranging from document clustering to natural language processing. This research proposes a new similarity function for trees, namely Extended Subtree (EST), where a new subtree mapping is proposed. EST generalizes the edit base distances by providing new rules for subtree mapping. Further, the new approach seeks to resolve the problems and limitations of previous approaches. Extensive evaluation frameworks are developed to evaluate the performance of the new approach against previous proposals. Clustering and classification case studies utilizing three real-world and one synthetic labeled data sets are performed to provide an unbiased evaluation where different distance functions are investigated. The experimental results demonstrate the superior performance of the proposed distance function. In addition, an empirical runtime analysis demonstrates that the new approach is one of the best tree distance functions in terms of runtime efficiency.
Keywords :
data structures; document handling; natural language processing; pattern clustering; EST; distance functions; document clustering; extended subtree; natural language processing; similarity function; subtree mapping; synthetic labeled data sets; tree structured data; Complexity theory; Distance measurement; Entropy; Indexes; Runtime; Support vector machines; Tree classification; tree clustering; tree comparison; tree distance function; tree similarity; tree structured data;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2013.53
Filename :
6487504
Link To Document :
بازگشت