DocumentCode :
2603032
Title :
Scalable Representative Instance Selection and Ranking
Author :
Zhu, Xingquan ; Wu, Xindong
Author_Institution :
Dept. of Comput. Sci., Vermont Univ.
Volume :
3
fYear :
0
fDate :
0-0 0
Firstpage :
352
Lastpage :
355
Abstract :
Finding a small set of representative instances for large datasets can bring various benefits to data mining practitioners so they can (1) build a learner superior to the one constructed from the whole massive data; and (2) avoid working on the whole original dataset all the time. We propose in this paper a scalable representative instance selection and ranking (SRISTAR pronounced 3STAR) mechanism, which carries two unique features: (1) it provides a representative instance ranking list, so that users can always select instances from the top to the bottom, based on the number of examples they prefer; and (2) it investigates the behaviors of the underlying examples for instance selection, and the selection procedure tries to optimize the expected future error. Given a dataset, we first cluster instances into small data cells, each of which consists of instances with similar behaviors. Then we progressively evaluate data cells and their combinations, and order them into a list such that the learners built from the top cells are more accurate
Keywords :
data mining; data mining; scalable representative instance ranking; scalable representative instance selection; Bagging; Boosting; Computational complexity; Computer science; Data mining; Data privacy; Machine learning; Machine learning algorithms; Sampling methods; Scalability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2006. ICPR 2006. 18th International Conference on
Conference_Location :
Hong Kong
ISSN :
1051-4651
Print_ISBN :
0-7695-2521-0
Type :
conf
DOI :
10.1109/ICPR.2006.1023
Filename :
1699538
Link To Document :
بازگشت