Title :
Probability Convergence in a Multithreaded Counting Application
Author :
Scherrer, Chad ; Beagley, Nathaniel ; Nieplocha, Jarek ; Márquez, Andrés ; Feo, John ; Chavarría-Miranda, Daniel
Author_Institution :
Pacific Northwest Nat. Lab., Richland, WA
Abstract :
The problem of counting specified combinations of a given set of variables arises in many statistical and data mining applications. To solve this problem, we introduce the PDtree data structure, which avoids exponential time and space complexity associated with prior work by allowing user specification of the tree structure. A straightforward parallelization approach using a Cray MTA-2 provides a speedup that is linear in the number of processors, but introduces nondeterminism into probability estimates. We prove a general convergence result that bounds the non-deterministic deviation of probability estimates relative to a sequential implementation. Beyond PDtrees, this convergence result applies to any counting application that takes a multithreaded streaming approach.
Keywords :
convergence; data mining; multi-threading; probability; tree data structures; Cray MTA-2; PDtree data structure; data mining; multithreaded counting; nondeterministic deviation; parallelization approach; probability convergence; Association rules; Bayesian methods; Contracts; Convergence; Data mining; Data structures; Databases; Graphical models; Laboratories; Tree data structures;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International
Conference_Location :
Long Beach, CA
Print_ISBN :
1-4244-0910-1
Electronic_ISBN :
1-4244-0910-1
DOI :
10.1109/IPDPS.2007.370688