مرکز منطقه ای اطلاع رساني علوم و فناوري - Memory characterization of a parallel data mining workload

DocumentCode :

3369543

Title :

Memory characterization of a parallel data mining workload

Author :

Kim, Jin-Soo ; Qin, Xiaohan ; Hsu, Yarsun

Author_Institution :

IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA

fYear :

1999

fDate :

1999

Firstpage :

Lastpage :

Abstract :

Studies a representative of an important class of emerging applications: a parallel data mining workload. The application, extracted from the IBM Intelligent Miner, identifies groups of records that are mathematically similar, based on a neural network model called a self-organizing map. We examine and compare, in detail, two implementations of the application: (1) temporal locality or working set size; (2) spatial locality and memory block utilization; (3) communication characteristics and scalability; and (4) translation lookaside buffer (TLB) performance. First, we find that the working set hierarchy of the application is governed by two parameters, namely the size of an input record and the size of prototype array; it is independent of the number of input records. Second, the application shows good spatial locality, with the implementation optimized for sparse data sets having slightly worse spatial locality. Third, due to the batch update scheme, the application bears very low communication. Finally, a two-way set-associative TLB may result in severely skewed TLB performance in a multiprocessor environment, caused by the large discrepancy in the number of conflict misses. Increasing the set associativity is more effective in mitigating the problem than increasing the TLB size

Keywords :

IBM computers; batch processing (computers); buffer storage; data mining; parallel machines; parallel processing; performance evaluation; self-organising feature maps; special purpose computers; IBM Intelligent Miner; batch update scheme; communication characteristics; conflict misses; input record size; memory block utilization; memory characterization; multiprocessor environment; neural network model; parallel data mining workload; prototype array size; scalability; self-organizing map; similar record groups; skewed TLB performance; sparse data sets; spatial locality; temporal locality; translation lookaside buffer performance; two-way set-associative TLB; working set hierarchy; working set size; Application software; Data mining; Ear; Insurance; Microprocessors; Organizing; Performance evaluation; Prototypes; Read only memory; Warehousing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Workload Characterization: Methodology and Case Studies, 1999

Conference_Location :

Dallas, TX

Print_ISBN :

0-7695-0450-7

Type :

conf

DOI :

10.1109/WWC.1998.809359

Filename :

809359

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3369543