Title :
Performance and memory-access characterization of data mining applications
Author :
Bradford, Jesrey P. ; Fortes, José
Author_Institution :
Purdue Univ., West Lafayette, IN, USA
Abstract :
Characterizes the performance and memory-access behavior of a decision tree induction program, a previously unstudied application used in data mining and knowledge discovery in databases. Performance is studied via RSIM, an execution-driven simulator, for three uniprocessor models that exploit instruction-level parallelism to varying degrees. Several properties of the program are noted. Out-of-order dispatch and multiple issue provide a significant performance advantage: 50%-250% improvement in inter-processor communication (IPC) for out-of-order dispatch vs. in-order dispatch, and 5%-120% improvement in IPC for four-way issue vs. single issue. Multiple issue provides a greater performance improvement for larger L2 cache sizes, when the program is limited by CPU performance; out-of-order dispatch provides a greater performance improvement for smaller L2 cache sizes. The program has a very small instruction footprint: for an 8-kB L1 instruction cache, the instruction miss rate is below 0.1%. A small (8 kB) L1 data cache is sufficient to capture most of the locality of the data references, resulting in L1 miss rates between 10%-20%. Increasing the size of the L2 data cache does not significantly improve performance until a significant fraction (over 1/4) of the data set fits into the L2 cache. Lastly, a procedure is developed for scaling the cache sizes when using scaled-down data sets, allowing the results for smaller data sets to be used to predict the performance of full-sized data sets
Keywords :
cache storage; data mining; decision trees; parallel programming; software performance evaluation; virtual machines; CPU performance; L1 data cache; L1 instruction cache; L2 data cache; RSIM; cache size scaling; data mining applications; data reference locality; databases; decision tree induction program; execution-driven simulator; instruction miss rate; instruction-level parallelism; inter-processor communication; knowledge discovery; memory access characterization; multiple issue; out-of-order dispatch; performance characterization; performance improvement; scaled-down data sets; uniprocessor models; Application software; Data engineering; Data mining; Databases; Decision trees; Humidity; Knowledge engineering; Out of order; Rain; Weather forecasting;
Conference_Titel :
Workload Characterization: Methodology and Case Studies, 1999
Conference_Location :
Dallas, TX
Print_ISBN :
0-7695-0450-7
DOI :
10.1109/WWC.1998.809358