DocumentCode :
1689677
Title :
TreeRank: a similarity measure for nearest neighbor searching in phylogenetic databases
Author :
Wang, Jason T L ; Shan, Huiyuan ; Shasha, Dennis ; Piel, William H.
Author_Institution :
Coll. of Comput. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
fYear :
2003
Firstpage :
171
Lastpage :
180
Abstract :
Phylogenetic trees are unordered labeled trees in which each leaf node has a label and the order among siblings is unimportant. In this paper we propose a new similarity measure, called TreeRank, for phylogenetic trees and present an algorithm for computing TreeRank scores. Given a query or pattern tree P and a data tree D, the TreeRank score from P to D is a measure of the topological relationships in P that are found to be the same or similar in D. The proposed algorithm calculates the TreeRank score in O(M2 + N) time where M is the number of nodes appearing in both P and D, and N is the number of nodes in D. We then develop a search engine that, given a query or pattern tree P and a database of trees D, finds and ranks the nearest neighbors of P in D where the "nearness" is measured by the proposed similarity function. This structure-based search engine is fully operational and is available on the World Wide Web.
Keywords :
Internet; query formulation; search engines; string matching; tree data structures; TreeRank; World Wide Web; data tree; nearest neighbor searching; pattern tree; phylogenetic database; phylogenetic tree; query processing; search engine; similarity function; similarity measure; Biology; Computer science; Data analysis; Databases; Educational institutions; Information retrieval; Nearest neighbor searches; Phylogeny; Search engines; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Scientific and Statistical Database Management, 2003. 15th International Conference on
ISSN :
1099-3371
Print_ISBN :
0-7695-1964-4
Type :
conf
DOI :
10.1109/SSDM.2003.1214978
Filename :
1214978
Link To Document :
بازگشت