Automatic Text categorization and summarization using rule reduction

Author

Devasena, C. Lakshmi ; Hemalatha, M.

Author_Institution

Dept. of Comput. Sci., Karpagam Univ., Coimbatore, India

fYear

2012

fDate

30-31 March 2012

Firstpage

594

Lastpage

598

Abstract

Text mining is a new field that attempts to bring together meaningful information from natural language text. Automatic Text categorization and summarization is the process of assigning pre-defined class labels to incoming, unclassified documents. The class labels are defined based on a set of examples of pre-classified documents used as a training corpus. This research work comprises an automatic text categorization and summarization approach to analyze the structure of input text. In this work a text analyzer is developed to derive the structure of the input text using rule reduction technique in three stages namely, Token Creation, Feature Identification and Categorization and Summarization. This analyzer is tested with sample input texts and gives noteworthy results. Extensive experimentation validates the selection of parameters and the efficacy of our approach for text classification. This work can be expanded and used in many practical applications, including indexing for document retrieval, organizing and maintaining large catalogues of Web resources, automatically extracting metadata, and Word sense disambiguation, etc.

Keywords

Web sites; data mining; feature extraction; indexing; information retrieval; meta data; pattern classification; text analysis; Web resources; automatic metadata extraction; automatic text categorization approach; automatic text summarization approach; catalogue maintaining; document indexing; document retrieval; feature categorization; feature identification; feature summarization; natural language text; preclassified documents; predefined class labels; rule reduction technique; text analyzer; text mining; token creation; training corpus; unclassified documents; word sense disambiguation; Cancer; Educational institutions; Indexing; Organizing; Presses; Text categorization; Feature Identification; Rule Reduction; Text Categorization and Summarization; Text Mining; Token Creation;

fLanguage

English

Publisher

ieee

Conference_Titel

Advances in Engineering, Science and Management (ICAESM), 2012 International Conference on

Conference_Location

Nagapattinam, Tamil Nadu

Print_ISBN

978-1-4673-0213-5

Type

conf

Filename

6215910