DocumentCode :
1493584
Title :
A Bounded and Adaptive Memory-Based Approach to Mine Frequent Patterns From Very Large Databases
Author :
Adnan, Muhaimenul ; Alhajj, Reda
Author_Institution :
Dept. of Comput. Sci., Univ. of Calgary, Calgary, AB, Canada
Volume :
41
Issue :
1
fYear :
2011
Firstpage :
154
Lastpage :
172
Abstract :
Most of the existing methods to solve the problem of association rules mining (ARM) rely on special data structures to project the database (either totally or partially) in the primary memory. Traditionally, these data structures reside in the main memory and rely on the existing paging mechanism of the virtual memory manager (VMM) to handle the storage problem when they go out of the primary memory. Typically, VMM stores the overloaded data into the secondary memory based on some preassumed memory usage criteria. However, this direct and unplanned use of virtual memory results in an unpredictable behavior or thrashing, as depicted by some of the works described in the literature. This problem is tackled in this paper by presenting an ARM model capable of mining a transactional database, regardless of its size and without relying on the underlying VMM; the proposed approach could use only a bounded portion of the primary memory and this gives the opportunity to assign other parts of the main memory to other tasks with different priority. In other words, we propose a specialized memory management system which caters to the needs of the ARM model in such a way that the proposed data structure is constructed in the available allocated primary memory first. If at any point the structure grows out of the allocated memory quota, it is forced to be partially saved on secondary memory. The secondary memory version of the structure is accessed in a block-by-block basis so that both the spatial and temporal localities of the I/O access are optimized. Thus, the proposed framework takes control of the virtual memory access and hence manages the required virtual memory in an optimal way to the best benefit of the mining process to be served. Several clever data structures are used to facilitate these optimizations. Our method has the additional advantage that other tasks of different priorities may run concurrently with the main mining task with as little interference as possibl- - e because we do not rely on the default paging mechanism of the VMM. The reported test results demonstrate the applicability and effectiveness of the proposed approach.
Keywords :
data mining; data structures; paged storage; pattern clustering; very large databases; adaptive memory; association rule mining; block by block basis; bounded memory based approach; data structure; default paging mechanism; frequent pattern mining; memory management; secondary storage; transactional database; very large database; virtual memory manager; Association rules mining (ARM); FP-growth; FP-tree; frequent pattern mining; frequent patterns; index structures; secondary storage; virtual memory management (VMM);
fLanguage :
English
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4419
Type :
jour
DOI :
10.1109/TSMCB.2010.2048900
Filename :
5466159
Link To Document :
بازگشت