Author_Institution :
Vermont Inf. Process., Colchester, VT, USA
Abstract :
In multi-dimensional data, Promotional Subspace Mining (PSM) aims to find out the most promotive subspaces for a given object. One major research issue is to produce top subspaces efficiently given a predefined subspace ranking measure. In this paper, we propose EProbe, an Efficient Subspace Probing framework. As opposed to commonly adopted strategies that produce an exact solution, this novel framework aims at providing a scalable, cost-effective, and flexible solution where its accuracy can be traded with the efficiency using adjustable parameters. As a first effort to implement this framework, we propose two novel algorithms SRatio and Sliding Cluster. The former applies score ratio (SR) based subspace sorting to obtain a sorted subspace set. The latter further includes the design of subspace sampling from sliding subspace clusters. The ultimate objective of both algorithms is to achieve an â"early stopâ" of the subspace search, when a certain number of top subspaces have been probed and evaluated. We propose two evaluation metrics: AVG Trace Index and Coverage, which favor the situation where important subspaces are captured within the subspace evaluation limit. By comparing SRatio and Sliding Cluster with a baseline algorithm DFP (Depth First Parent Subspace Pruning), we show a remarkable superiority of algorithm SRatio (Sliding Cluster with w = 1) over DFP, and a consistent and significant improvement of algorithm Sliding Cluster over SRatio, when the computation resources are insufficient and only a limited number of candidate subspaces can be evaluated.
Keywords :
data mining; pattern clustering; sorting; AVG Tracelndex; Coverage; EProbe; SRatio; SlidingCluster; depth first parent subspace pruning; multidimensional data; promotional subspace mining; score ratio based subspace sorting; subspace probing framework; Accuracy; Algorithm design and analysis; Clustering algorithms; Measurement; Silicon; Sorting; Strontium; EProbe; PatRanking; Promotional Subspace Mining (PSM);