Abstract :
In maximum entropy model (MEM), features are typically represented by either 0-1 binary-valued function or real-valued function. However, both representations only examine the impact of specific value of some attributes but not their types. Such negligence not only causes the decreasing of classification precision, but also slows the convergence speed of the generalized iterative scaling (GIS) algorithm, as more apparent to incomplete data. In this paper, an improved feature representation method is presented. The feature is composed of two parts: the first one is for specific value of an attribute; the second one is for the type of corresponding attribute. The experimental results on Mushroom dataset of UCI data repository showed that the average classifying precisions on incomplete dataset and complete dataset were improved by 1.5% and 3.0% respectively, and the average convergence speed was improved by 42.9% and 90.7% respectively
Keywords :
knowledge representation; maximum entropy methods; pattern classification; Mushroom dataset; UCI data repository; corresponding attribute; feature representation; incomplete data; maximum entropy model; specific value attribute; Bayesian methods; Convergence; Data analysis; Data mining; Entropy; Geographic Information Systems; Internet; Iterative algorithms; Maximum likelihood estimation; Statistical analysis;