DocumentCode
3066499
Title
Classification using pattern probability estimators
Author
Acharya, Jayadev ; Das, Hirakendu ; Orlitsky, Alon ; Pan, Shengjun ; Santhanam, Narayana P.
Author_Institution
ECE, UCSD, La Jolla, CA, USA
fYear
2010
fDate
13-18 June 2010
Firstpage
1493
Lastpage
1497
Abstract
We consider the problem of classification, where the data of the classes are generated i.i.d. according to unknown probability distributions. The goal is to classify test data with minimum error probability, based on the training data available for the classes. The Likelihood Ratio Test (LRT) is the optimal decision rule when the distributions are known. Hence, a popular approach for classification is to estimate the likelihoods using well known probability estimators, e.g., the Laplace and Good-Turing estimators, and use them in a LRT. We are primarily interested in situations where the alphabet of the underlying distributions is large compared to the training data available, which is indeed the case in most practical applications. We motivate and propose LRT´s based on pattern probability estimators that are known to achieve low redundancy for universal compression of large alphabet sources. While a complete proof for optimality of these decision rules is warranted, we demonstrate their performance and compare it with other well-known classifiers by various experiments on synthetic data and real data for text classification.
Keywords
data compression; error statistics; pattern classification; statistical distributions; text analysis; Laplace estimators; good-Turing estimators; large alphabet source universal compression; likelihood ratio test; minimum error probability; optimal decision rule; pattern probability estimators; probability distributions; text classification; Error probability; Information theory; Light rail systems; Machine learning; Optical character recognition software; Probability distribution; Redundancy; Testing; Text categorization; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on
Conference_Location
Austin, TX
Print_ISBN
978-1-4244-7890-3
Electronic_ISBN
978-1-4244-7891-0
Type
conf
DOI
10.1109/ISIT.2010.5513570
Filename
5513570
Link To Document