• DocumentCode
    562681
  • Title

    Automatic Text categorization and summarization using rule reduction

  • Author

    Devasena, C. Lakshmi ; Hemalatha, M.

  • Author_Institution
    Dept. of Comput. Sci., Karpagam Univ., Coimbatore, India
  • fYear
    2012
  • fDate
    30-31 March 2012
  • Firstpage
    594
  • Lastpage
    598
  • Abstract
    Text mining is a new field that attempts to bring together meaningful information from natural language text. Automatic Text categorization and summarization is the process of assigning pre-defined class labels to incoming, unclassified documents. The class labels are defined based on a set of examples of pre-classified documents used as a training corpus. This research work comprises an automatic text categorization and summarization approach to analyze the structure of input text. In this work a text analyzer is developed to derive the structure of the input text using rule reduction technique in three stages namely, Token Creation, Feature Identification and Categorization and Summarization. This analyzer is tested with sample input texts and gives noteworthy results. Extensive experimentation validates the selection of parameters and the efficacy of our approach for text classification. This work can be expanded and used in many practical applications, including indexing for document retrieval, organizing and maintaining large catalogues of Web resources, automatically extracting metadata, and Word sense disambiguation, etc.
  • Keywords
    Web sites; data mining; feature extraction; indexing; information retrieval; meta data; pattern classification; text analysis; Web resources; automatic metadata extraction; automatic text categorization approach; automatic text summarization approach; catalogue maintaining; document indexing; document retrieval; feature categorization; feature identification; feature summarization; natural language text; preclassified documents; predefined class labels; rule reduction technique; text analyzer; text mining; token creation; training corpus; unclassified documents; word sense disambiguation; Cancer; Educational institutions; Indexing; Organizing; Presses; Text categorization; Feature Identification; Rule Reduction; Text Categorization and Summarization; Text Mining; Token Creation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Engineering, Science and Management (ICAESM), 2012 International Conference on
  • Conference_Location
    Nagapattinam, Tamil Nadu
  • Print_ISBN
    978-1-4673-0213-5
  • Type

    conf

  • Filename
    6215910