مرکز منطقه ای اطلاع رساني علوم و فناوري - Fitting tree metrics: Hierarchical clustering and phylogeny

DocumentCode :

2385878

Title :

Fitting tree metrics: Hierarchical clustering and phylogeny

Author :

Ailon, Nir ; Charikar, Moses

Author_Institution :

Princeton Univ., NJ, USA

fYear :

2005

fDate :

23-25 Oct. 2005

Firstpage :

Lastpage :

Abstract :

Given dissimilarity data on pairs of objects in a set, we study the problem of fitting a tree metric to this data so as to minimize additive error (i.e. some measure of the difference between the tree metric and the given data). This problem arises in constructing an M-level hierarchical clustering of objects (or an ultrametric on objects) so as to match the given dissimilarity data - a basic problem in statistics. Viewed in this way, the problem is a generalization of the correlation clustering problem (which corresponds to M = 1). We give a very simple randomized combinatorial algorithm for the M-level hierarchical clustering problem that achieves an approximation ratio of M+2. This is a generalization of a previous factor 3 algorithm for correlation clustering on complete graphs. The problem of fitting tree metrics also arises in phylogeny where the objective is to learn the evolution tree by fitting a tree to dissimilarity data on taxa. The quality of the fit is measured by taking the l_p norm of the difference between the tree metric constructed and the given data. Previous results obtained a factor 3 approximation for finding the closest tree tree metric under the l∞ norm. No nontrivial approximation for general l_p norms was known before. We present a novel LP formulation for this problem and obtain an O((log n log log n)¹p/) approximation using this. Enroute, we obtain an O((log n log log n)¹p/) approximation for the closest ultrametric under the l_p norm. Our techniques are based on representing and viewing an ultrametric as a hierarchy of clusterings, and may be useful in other contexts.

Keywords :

approximation theory; computational complexity; randomised algorithms; trees (mathematics); complete graph; correlation clustering problem; evolution tree; fitting tree metric; hierarchical clustering; nontrivial approximation; randomized combinatorial algorithm; taxa dissimilarity data; tree fitting; ultrametric clustering hierarchy; ultrametric object; Additives; Approximation algorithms; Clustering algorithms; Engineering profession; Joining processes; Phylogeny; Statistics; Taxonomy; Tree graphs; US Department of Energy;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Foundations of Computer Science, 2005. FOCS 2005. 46th Annual IEEE Symposium on

Print_ISBN :

0-7695-2468-0

Type :

conf

DOI :

10.1109/SFCS.2005.36

Filename :

1530703

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2385878