Title :
The role of least frequent item sets in association discovery
Author :
Swargam, Rani J. ; Palakal, Mathew J.
Author_Institution :
Sch. of Informatics, Indiana Univ., Indianapolis, IN
Abstract :
Advances in commercial and scientific data collection have generated a flood of data which has triggered the need to turn such data into useful information and knowledge to identify novel, potentially useful patterns in data stored in databases. This work presents the development, implementation and application of an adaptive apriori algorithm for mining large datasets focusing on extracting interesting associations rules for less frequent item sets. The relevance of the adaptive apriori algorithm has been studied with respect to the set of data that was obtained by applying the transitive closure property among objects obtained from the biomedical scientific literature where both frequent and infrequent events need to be detected. Three adaptive apriori association rule mining methods, the weight induced apriori association rule mining (WIAARM), weight and significance induced apriori association rule mining (WSIAARM) and significance induced apriori association rule mining (SIAARM) are presented to effectively prune and extract meaningful association rules. These rules were applied on a large set of biological entity-entity association records and the results indicated that both WIAARM and WSIAARM were able to discover item sets with low and high frequency.
Keywords :
data mining; adaptive apriori association rule mining; association discovery; least frequent item sets; significance induced apriori association rule mining; weight induced apriori association rule mining; Association rules; Biomedical measurements; Data mining; Diseases; Frequency; Informatics; Probability; Proteins; Text mining; Transaction databases;
Conference_Titel :
Digital Information Management, 2007. ICDIM '07. 2nd International Conference on
Conference_Location :
Lyon
Print_ISBN :
978-1-4244-1475-8
Electronic_ISBN :
978-1-4244-1476-5
DOI :
10.1109/ICDIM.2007.4444226