مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2849852

Title :

Non-redundant data clustering

Author :

Gondek, David ; Hofmann, Thomas

Author_Institution :

Dept. of Comput. Sci., Brown Univ., Providence, RI, USA

fYear :

2004

fDate :

1-4 Nov. 2004

Firstpage :

Lastpage :

Abstract :

Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice, this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and non-numeric attributes. We present experimental results for applications in text mining and computer vision.

Keywords :

computer vision; data mining; pattern clustering; text analysis; class groupings; class structures; computer vision; conditional mutual information; coordinated conditional information bottleneck; knowledge discovery; nonredundant data clustering; optimization scheme; text mining; Application software; Cities and towns; Computer science; Computer vision; Data mining; Demography; Face detection; Geography; Mutual information; Text mining;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on

Print_ISBN :

0-7695-2142-8

Type :

conf

DOI :

10.1109/ICDM.2004.10104

Filename :

1410269

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2849852