Title :
Making good choices of non-redundant n-gramwords
Author :
Moura, Maria Fernanda ; Nogueira, Bruno Magalhães ; da Silva Conrado, M. ; Santos, Fabiano Fernandes dos ; Rezende, Solange Oliveira
Author_Institution :
Embrapa Inf. Agropecuaria, Campinas
Abstract :
A new complete proposal to solve the problem of automatically selecting good and non redundant n-gram words as attributes for textual data is proposed. Generally, the use of n-gram words is required to improve the subjective interpretability of a text mining task, with n ges 2. In these cases, the n-gram words are statistically generated and selected, which always implies in redundancy. The proposed method eliminates only the redundancies. This can be observed by the results of classifiers over the original and the non redundant data sets, because, there is not a decrease in the categorization effectiveness. Additionally, the method is useful for any kind of machine learning process applied to a text mining task.
Keywords :
data mining; statistical analysis; text analysis; machine learning process; nonredundant n-gram words; subjective interpretability; text mining task; Artificial intelligence; Data mining; Decision making; Frequency estimation; Machine learning; Manuals; Mathematics; Proposals; Supervised learning; Text mining;
Conference_Titel :
Computer and Information Technology, 2008. ICCIT 2008. 11th International Conference on
Conference_Location :
Khulna
Print_ISBN :
978-1-4244-2135-0
Electronic_ISBN :
978-1-4244-2136-7
DOI :
10.1109/ICCITECHN.2008.4803111