• DocumentCode
    1796727
  • Title

    Automatic text categorization using a system of high-precision and high-recall models

  • Author

    Dai Li ; Murphey, Yi L.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Michigan, Dearborn, MI, USA
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    373
  • Lastpage
    380
  • Abstract
    This paper presents an automatic text document categorization system, HPHR. HPHR contains high precision, high recall and noise-filtered text categorization models. The text categorization models are generated through a suite of machine learning algorithms, a fast clustering algorithm that efficiently and effectively group documents into subcategories, and a text category generation algorithm that automatically generates text subcategories that represent high precision, high recall and noise-filtered text categorization models from a given set of training documents. The HPHR system was evaluated on documents drawn from two different applications, vehicle fault diagnostic documents, which are in a form of unstructured and verbatim text descriptions, and Reuters corpus. The performance of the proposed system, HPHR, on both document collections showed superiority over the systems commonly used in text document categorization.
  • Keywords
    data mining; learning (artificial intelligence); pattern clustering; text analysis; HPHR; Reuters corpus; automatic text document categorization system; clustering algorithm; high-precision and high-recall models; machine learning algorithms; text mining; vehicle fault diagnostic documents; Algorithm design and analysis; Clustering algorithms; Machine learning algorithms; Text categorization; Training; Training data; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Data Mining (CIDM), 2014 IEEE Symposium on
  • Conference_Location
    Orlando, FL
  • Type

    conf

  • DOI
    10.1109/CIDM.2014.7008692
  • Filename
    7008692