DocumentCode :
2173943
Title :
Decision tree learning on very large data sets
Author :
Hall, L.O. ; Chawla, N. ; Bowyer, K.W.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA
Volume :
3
fYear :
1998
fDate :
14-14 Oct. 1998
Firstpage :
2579
Abstract :
Consider a labeled data set of 1 terabyte in size. A salient subset might depend upon the users interests. Clearly, browsing such a large data set to find interesting areas would be very time consuming. An intelligent agent which, for a given class of user, could provide hints on areas of the data that might interest the user would be very useful. Given large data sets having categories of salience for different user classes attached to the data in them, these labeled sets of data can be used to train a decision tree to label unseen data examples with a category of salience. The training set will be much larger than usual. This paper describes an approach to generating the rules for an agent from a large training set. A set of decision trees are built in parallel on tractable size training data sets which are a subset of the original data. Each learned decision tree will be reduced to a set of rules, conflicting rules resolved and the resultant rules merged into one set. Results from cross validation experiments on a data set suggest this approach may be effectively applied to large sets of data.
Keywords :
data mining; decision trees; learning (artificial intelligence); software agents; very large databases; 1 TB; conflicting rule resolution; cross validation experiments; decision tree learning; intelligent agent; labeled data set; rule merging; tractable size training data sets; very large data sets; Classification tree analysis; Computer science; Data mining; Data visualization; Decision trees; Intelligent agent; Machine learning; Parallel processing; Training data; Visual databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on
Conference_Location :
San Diego, CA, USA
ISSN :
1062-922X
Print_ISBN :
0-7803-4778-1
Type :
conf
DOI :
10.1109/ICSMC.1998.725047
Filename :
725047
Link To Document :
بازگشت