Title :
The TaxGen framework: automating the generation of a taxonomy for a large document collection
Author :
Muller, A. ; Dorre, J. ; Gerstl, P. ; Seiffert, R.
Author_Institution :
Dept. of Software Solutions Dev., IBM Germany, Germany
Abstract :
Text mining is an active area of research and development, which combines and expands techniques found in related areas like information retrieval, computational linguistics and data mining to perform an analysis of large corpora of digital documents. This paper describes the TaxGen text mining project carried out at the IBM Software Development Lab. at Boeblingen, Germany. The goal of TaxGen was the automatic generation of a taxonomy for a collection of previously unstructured documents, namely a set of 73,000 news wire documents spanning one year.
Keywords :
classification; computational linguistics; data mining; information retrieval; text analysis; very large databases; IBM Software Development Lab., Boeblingen, Germany; TaxGen text mining project; automatic taxonomy generation; computational linguistics; data mining; digital documents; information retrieval; large document collection; news wire documents; text corpus analysis; unstructured documents; Computational linguistics; Data mining; Information analysis; Information retrieval; Performance analysis; Programming; Research and development; Taxonomy; Text mining; Wire;
Conference_Titel :
Systems Sciences, 1999. HICSS-32. Proceedings of the 32nd Annual Hawaii International Conference on
Conference_Location :
Maui, HI, USA
Print_ISBN :
0-7695-0001-3
DOI :
10.1109/HICSS.1999.772687