Title :
Extracting knowledge using probabilistic classifier for text mining
Author_Institution :
Dept. of Comput. Applic., K.S.R. Coll. of Technol., Tiruchengode, India
Abstract :
Text mining is a process of extracting knowledge from large text documents. A new probabilistic classifier for text mining is proposed in this paper. It uses ODP taxonomy and domain ontology and datasets to cluster and identify the category of the given text document. The proposed work has three steps, namely, preprocessing, rule generation and probability calculation. At the stage of preprocessing the input document is split into paragraphs and statements. In rule generation, the documents from the training set are read. In probability calculation, positive and negative weight factor is calculated. The proposed algorithm calculates the positive probability value and negative probability value for each term set or pattern identified from the document. Based on the calculated probability value the probabilistic classifier indexes the document to the concern group of the cluster.
Keywords :
data mining; ontologies (artificial intelligence); pattern classification; probability; text analysis; ODP taxonomy; datasets; domain ontology; input document; knowledge extraction; positive weight factor; preprocessing; probabilistic classifier; probability calculation; rule generation; text documents; text mining; training set; Association rules; Databases; Probabilistic logic; Probability; Text mining; Training; Classification; Clustering; ODP Taxonomy; Probabilistic Classifier; Text Mining; categorization;
Conference_Titel :
Pattern Recognition, Informatics and Mobile Engineering (PRIME), 2013 International Conference on
Conference_Location :
Salem
Print_ISBN :
978-1-4673-5843-9
DOI :
10.1109/ICPRIME.2013.6496517