DocumentCode
1689677
Title
TreeRank: a similarity measure for nearest neighbor searching in phylogenetic databases
Author
Wang, Jason T L ; Shan, Huiyuan ; Shasha, Dennis ; Piel, William H.
Author_Institution
Coll. of Comput. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
fYear
2003
Firstpage
171
Lastpage
180
Abstract
Phylogenetic trees are unordered labeled trees in which each leaf node has a label and the order among siblings is unimportant. In this paper we propose a new similarity measure, called TreeRank, for phylogenetic trees and present an algorithm for computing TreeRank scores. Given a query or pattern tree P and a data tree D, the TreeRank score from P to D is a measure of the topological relationships in P that are found to be the same or similar in D. The proposed algorithm calculates the TreeRank score in O(M2 + N) time where M is the number of nodes appearing in both P and D, and N is the number of nodes in D. We then develop a search engine that, given a query or pattern tree P and a database of trees D, finds and ranks the nearest neighbors of P in D where the "nearness" is measured by the proposed similarity function. This structure-based search engine is fully operational and is available on the World Wide Web.
Keywords
Internet; query formulation; search engines; string matching; tree data structures; TreeRank; World Wide Web; data tree; nearest neighbor searching; pattern tree; phylogenetic database; phylogenetic tree; query processing; search engine; similarity function; similarity measure; Biology; Computer science; Data analysis; Databases; Educational institutions; Information retrieval; Nearest neighbor searches; Phylogeny; Search engines; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Scientific and Statistical Database Management, 2003. 15th International Conference on
ISSN
1099-3371
Print_ISBN
0-7695-1964-4
Type
conf
DOI
10.1109/SSDM.2003.1214978
Filename
1214978
Link To Document