The TaxGen framework: automating the generation of a taxonomy for a large document collection

Author

Muller, A. ; Dorre, J. ; Gerstl, P. ; Seiffert, R.

Author_Institution

Dept. of Software Solutions Dev., IBM Germany, Germany

Volume

Track2

fYear

1999

fDate

5-8 Jan. 1999

Abstract

Text mining is an active area of research and development, which combines and expands techniques found in related areas like information retrieval, computational linguistics and data mining to perform an analysis of large corpora of digital documents. This paper describes the TaxGen text mining project carried out at the IBM Software Development Lab. at Boeblingen, Germany. The goal of TaxGen was the automatic generation of a taxonomy for a collection of previously unstructured documents, namely a set of 73,000 news wire documents spanning one year.

Keywords

classification; computational linguistics; data mining; information retrieval; text analysis; very large databases; IBM Software Development Lab., Boeblingen, Germany; TaxGen text mining project; automatic taxonomy generation; computational linguistics; data mining; digital documents; information retrieval; large document collection; news wire documents; text corpus analysis; unstructured documents; Computational linguistics; Data mining; Information analysis; Information retrieval; Performance analysis; Programming; Research and development; Taxonomy; Text mining; Wire;

fLanguage

English

Publisher

ieee

Conference_Titel

Systems Sciences, 1999. HICSS-32. Proceedings of the 32nd Annual Hawaii International Conference on

Conference_Location

Maui, HI, USA

Print_ISBN

0-7695-0001-3

Type

conf

DOI

10.1109/HICSS.1999.772687

Filename

772687