Automatic text categorization using a system of high-precision and high-recall models

Author

Dai Li ; Murphey, Yi L.

Author_Institution

Dept. of Electr. & Comput. Eng., Univ. of Michigan, Dearborn, MI, USA

fYear

2014

fDate

9-12 Dec. 2014

Firstpage

373

Lastpage

380

Abstract

This paper presents an automatic text document categorization system, HPHR. HPHR contains high precision, high recall and noise-filtered text categorization models. The text categorization models are generated through a suite of machine learning algorithms, a fast clustering algorithm that efficiently and effectively group documents into subcategories, and a text category generation algorithm that automatically generates text subcategories that represent high precision, high recall and noise-filtered text categorization models from a given set of training documents. The HPHR system was evaluated on documents drawn from two different applications, vehicle fault diagnostic documents, which are in a form of unstructured and verbatim text descriptions, and Reuters corpus. The performance of the proposed system, HPHR, on both document collections showed superiority over the systems commonly used in text document categorization.

Keywords

data mining; learning (artificial intelligence); pattern clustering; text analysis; HPHR; Reuters corpus; automatic text document categorization system; clustering algorithm; high-precision and high-recall models; machine learning algorithms; text mining; vehicle fault diagnostic documents; Algorithm design and analysis; Clustering algorithms; Machine learning algorithms; Text categorization; Training; Training data; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational Intelligence and Data Mining (CIDM), 2014 IEEE Symposium on

Conference_Location

Orlando, FL

Type

conf

DOI

10.1109/CIDM.2014.7008692

Filename

7008692