Title :
Automatic Text categorization and summarization using rule reduction
Author :
Devasena, C. Lakshmi ; Hemalatha, M.
Author_Institution :
Dept. of Comput. Sci., Karpagam Univ., Coimbatore, India
Abstract :
Text mining is a new field that attempts to bring together meaningful information from natural language text. Automatic Text categorization and summarization is the process of assigning pre-defined class labels to incoming, unclassified documents. The class labels are defined based on a set of examples of pre-classified documents used as a training corpus. This research work comprises an automatic text categorization and summarization approach to analyze the structure of input text. In this work a text analyzer is developed to derive the structure of the input text using rule reduction technique in three stages namely, Token Creation, Feature Identification and Categorization and Summarization. This analyzer is tested with sample input texts and gives noteworthy results. Extensive experimentation validates the selection of parameters and the efficacy of our approach for text classification. This work can be expanded and used in many practical applications, including indexing for document retrieval, organizing and maintaining large catalogues of Web resources, automatically extracting metadata, and Word sense disambiguation, etc.
Keywords :
Web sites; data mining; feature extraction; indexing; information retrieval; meta data; pattern classification; text analysis; Web resources; automatic metadata extraction; automatic text categorization approach; automatic text summarization approach; catalogue maintaining; document indexing; document retrieval; feature categorization; feature identification; feature summarization; natural language text; preclassified documents; predefined class labels; rule reduction technique; text analyzer; text mining; token creation; training corpus; unclassified documents; word sense disambiguation; Cancer; Educational institutions; Indexing; Organizing; Presses; Text categorization; Feature Identification; Rule Reduction; Text Categorization and Summarization; Text Mining; Token Creation;
Conference_Titel :
Advances in Engineering, Science and Management (ICAESM), 2012 International Conference on
Conference_Location :
Nagapattinam, Tamil Nadu
Print_ISBN :
978-1-4673-0213-5