Title :
Discretization Based on Positive Domain and Information Entropy
Author :
Juan, Zhao ; Wang Mingchun ; Kun, Liu ; Can, Wang
Author_Institution :
Sch. of Sci., Tianjin Univ. of Technol. & Educ., Tianjin, China
Abstract :
A proper discretization 1 of numerical attributes is of paramount importance on applications of data mining and machine learning. In the classical discretization algorithm based on information entropy, the importance of the breakpoints is measured by the decrement of the uncertainty level in a decision table. In this paper, a novel discretization algorithm based on positive domain is proposed. It concerns the increment of the certainty degree to measure the importance of the breakpoints for a decision table. Afterwards, by simultaneously increasing the certainty degree and decreasing the uncertainty level for a decision table, another new aggregated algorithm measures the importance of breakpoints in a comprehensive and reasonable way. Finally, the experimental result shows that the aggregated algorithm outperforms the other two algorithms in terms of both the breakpoint amount and classification accuracy on five data sets.
Keywords :
data mining; decision tables; entropy; learning (artificial intelligence); pattern classification; breakpoints; certainty degree; classification accuracy; data mining; data sets; decision table; discretization algorithm; information entropy; machine learning; numerical attributes; positive domain; uncertainty level; Accuracy; Algorithm design and analysis; Clustering algorithms; Educational institutions; Information entropy; Machine learning algorithms; Uncertainty; continuous attributes; discretization; information entropy; positive domain;
Conference_Titel :
Computational Intelligence and Security (CIS), 2011 Seventh International Conference on
Conference_Location :
Hainan
Print_ISBN :
978-1-4577-2008-6
DOI :
10.1109/CIS.2011.65