DocumentCode :
2919153
Title :
FACT: Fast Algorithm for Categorizing Text
Author :
Mengle, Saket S R ; Goharian, Nazli ; Platt, Alana
Author_Institution :
Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
fYear :
2007
fDate :
23-24 May 2007
Firstpage :
308
Lastpage :
315
Abstract :
With the ever-increasing number of digital documents, the ability to automatically classify those documents both quickly and accurately is becoming more critical and difficult. We present Fast Algorithm for Categorizing Text (FACT), which is a statistical based multi-way classifier with our proposed feature selection, Ambiguity Measure (AM), which uses only the most unambiguous keywords to predict the category of a document. Our empirical results show that FACT outperforms the best results on the best performing feature selection for the Naive Bayes classifier namely, Odds Ratio. We empirically show the effectiveness of our approach in outperforming Odds Ratio using four benchmark datasets with a statistical significance of 99% confidence level. Furthermore, the performance of FACT is comparable or better than current non-statistical based classifiers.
Keywords :
Bayes methods; feature extraction; pattern classification; statistical analysis; text analysis; ambiguity measure; digital document classification; fast algorithm; feature selection; statistical based multiway Bayes classifier; text categorization; Computer science; Data mining; Electronic commerce; Electronic mail; Filtering; Hospitals; Machine learning algorithms; Routing; Statistics; Text categorization; Text classification; Text processing; knowledge discovery; text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics, 2007 IEEE
Conference_Location :
New Brunswick, NJ
Electronic_ISBN :
1-4244-1329-X
Type :
conf
DOI :
10.1109/ISI.2007.379490
Filename :
4258716
Link To Document :
بازگشت