DocumentCode :
3369543
Title :
Memory characterization of a parallel data mining workload
Author :
Kim, Jin-Soo ; Qin, Xiaohan ; Hsu, Yarsun
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
1999
fDate :
1999
Firstpage :
60
Lastpage :
68
Abstract :
Studies a representative of an important class of emerging applications: a parallel data mining workload. The application, extracted from the IBM Intelligent Miner, identifies groups of records that are mathematically similar, based on a neural network model called a self-organizing map. We examine and compare, in detail, two implementations of the application: (1) temporal locality or working set size; (2) spatial locality and memory block utilization; (3) communication characteristics and scalability; and (4) translation lookaside buffer (TLB) performance. First, we find that the working set hierarchy of the application is governed by two parameters, namely the size of an input record and the size of prototype array; it is independent of the number of input records. Second, the application shows good spatial locality, with the implementation optimized for sparse data sets having slightly worse spatial locality. Third, due to the batch update scheme, the application bears very low communication. Finally, a two-way set-associative TLB may result in severely skewed TLB performance in a multiprocessor environment, caused by the large discrepancy in the number of conflict misses. Increasing the set associativity is more effective in mitigating the problem than increasing the TLB size
Keywords :
IBM computers; batch processing (computers); buffer storage; data mining; parallel machines; parallel processing; performance evaluation; self-organising feature maps; special purpose computers; IBM Intelligent Miner; batch update scheme; communication characteristics; conflict misses; input record size; memory block utilization; memory characterization; multiprocessor environment; neural network model; parallel data mining workload; prototype array size; scalability; self-organizing map; similar record groups; skewed TLB performance; sparse data sets; spatial locality; temporal locality; translation lookaside buffer performance; two-way set-associative TLB; working set hierarchy; working set size; Application software; Data mining; Ear; Insurance; Microprocessors; Organizing; Performance evaluation; Prototypes; Read only memory; Warehousing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Workload Characterization: Methodology and Case Studies, 1999
Conference_Location :
Dallas, TX
Print_ISBN :
0-7695-0450-7
Type :
conf
DOI :
10.1109/WWC.1998.809359
Filename :
809359
Link To Document :
بازگشت