Title :
A Calculation Mechanism for Similarity Measure with Clustering an Unbalanced Hierarchical Terminology Structure
Author :
Wang, MinTzu ; Hsu, PingYu ; Lin, K.C. ; Hung, Jason
Author_Institution :
NCU, Beijing
Abstract :
The effective retrieval of reverent information often is quite useful to the user, for example, to query the respectful knowledge or information, especially for on-line e-leaner. The most common method is to make use of synonym and antonym from a dictionary with the most frequent terms. However, sometimes we are focusing on a pair of or a set of associated keywords offered by user, instead of same meaning. Generally, we would probably adopt the association rule to solve the problem. Nonetheless, the keywords or terms sets extracted from huge queries often contain sparse information composed of a wide range of keywords, with each term set only containing a few terms. These data render basket analysis with extremely low item support, lift the term to a higher level of concept hierarchy may get enough support, but missing the detailed information. Although a similarity measure represented by counting the depth of the least common ancestor normalized by the depth of the concept tree lifts the limitation of binary equality, it produces counter intuitive results when the concept hierarchy is unbalanced since two terms in deeper subtrees are very likely to have a higher similarity than two terms in shallower subtrees. The research proposes to calculate the distance between two terms by counting the edge traversal needed yet from user´s viewpoint to link them in order to solve the issues. The method is straight forward yet achieves better outcome with information query when concept hierarchy is unbalanced.
Keywords :
data mining; query processing; data render basket analysis; information query; information retrieval; keywords extraction; similarity measure; sparse information; subtrees; terms sets extraction; unbalanced hierarchical terminology structure; Association rules; Clustering algorithms; Data analysis; Data mining; Dictionaries; Information analysis; Information filtering; Information filters; Information retrieval; Terminology; Clustering; Data Mining; Hierarchy; Similarity Measure;
Conference_Titel :
Parallel Processing Workshops, 2007. ICPPW 2007. International Conference on
Conference_Location :
Xian
Print_ISBN :
0-7695-2934-8
Electronic_ISBN :
1530-2016
DOI :
10.1109/ICPPW.2007.6