Text clustering based on good aggregations

Author

Hotho, Andreas ; Maedche, Alexander ; Staab, Steffen

Author_Institution

Inst. fur Angewandte Inf. und Formale Beschreibungsverfahren, Karlsruhe Univ., Germany

fYear

2001

fDate

2001

Firstpage

607

Lastpage

608

Abstract

Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. We propose a new approach for applying background knowledge (in terms of an ontology) during preprocessing in order to improve clustering results and allow for selection between results. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy

Keywords

data mining; data warehouses; pattern clustering; text analysis; background knowledge; good aggregations; high dimensional space; ontology; preprocessing; text clustering; Clustering algorithms; Clustering methods; Heuristic algorithms; Humans; Knowledge management; Measurement standards; Navigation; Ontologies; Web pages;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on

Conference_Location

San Jose, CA

Print_ISBN

0-7695-1119-8

Type

conf

DOI

10.1109/ICDM.2001.989577

Filename

989577