مرکز منطقه ای اطلاع رساني علوم و فناوري - Performance and memory-access characterization of data mining applications

DocumentCode :

3369526

Title :

Performance and memory-access characterization of data mining applications

Author :

Bradford, Jesrey P. ; Fortes, José

Author_Institution :

Purdue Univ., West Lafayette, IN, USA

fYear :

1999

fDate :

1999

Firstpage :

Lastpage :

Abstract :

Characterizes the performance and memory-access behavior of a decision tree induction program, a previously unstudied application used in data mining and knowledge discovery in databases. Performance is studied via RSIM, an execution-driven simulator, for three uniprocessor models that exploit instruction-level parallelism to varying degrees. Several properties of the program are noted. Out-of-order dispatch and multiple issue provide a significant performance advantage: 50%-250% improvement in inter-processor communication (IPC) for out-of-order dispatch vs. in-order dispatch, and 5%-120% improvement in IPC for four-way issue vs. single issue. Multiple issue provides a greater performance improvement for larger L2 cache sizes, when the program is limited by CPU performance; out-of-order dispatch provides a greater performance improvement for smaller L2 cache sizes. The program has a very small instruction footprint: for an 8-kB L1 instruction cache, the instruction miss rate is below 0.1%. A small (8 kB) L1 data cache is sufficient to capture most of the locality of the data references, resulting in L1 miss rates between 10%-20%. Increasing the size of the L2 data cache does not significantly improve performance until a significant fraction (over 1/4) of the data set fits into the L2 cache. Lastly, a procedure is developed for scaling the cache sizes when using scaled-down data sets, allowing the results for smaller data sets to be used to predict the performance of full-sized data sets

Keywords :

cache storage; data mining; decision trees; parallel programming; software performance evaluation; virtual machines; CPU performance; L1 data cache; L1 instruction cache; L2 data cache; RSIM; cache size scaling; data mining applications; data reference locality; databases; decision tree induction program; execution-driven simulator; instruction miss rate; instruction-level parallelism; inter-processor communication; knowledge discovery; memory access characterization; multiple issue; out-of-order dispatch; performance characterization; performance improvement; scaled-down data sets; uniprocessor models; Application software; Data engineering; Data mining; Databases; Decision trees; Humidity; Knowledge engineering; Out of order; Rain; Weather forecasting;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Workload Characterization: Methodology and Case Studies, 1999

Conference_Location :

Dallas, TX

Print_ISBN :

0-7695-0450-7

Type :

conf

DOI :

10.1109/WWC.1998.809358

Filename :

809358

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3369526