DocumentCode :
3248180
Title :
Mining frequent patterns from XML data
Author :
Win, Chit Nilar ; Hla, Khin Haymar Saw
Author_Institution :
Computer Studies Univ., Yangon
fYear :
2005
fDate :
10-10 Nov. 2005
Firstpage :
208
Lastpage :
212
Abstract :
The Web is rich with information. However, the data contained in the web is not well organized which makes obtaining useful information from the Web a difficult task. The successful development of extensible Markup Language (XML) as a standard to represent semi structured data makes the data contained in the Web more readable and the task of mining useful information from the Web becomes feasible. XML has become very popular for representing semistructured data and a standard for data exchange over the Web. Mining XML data from the Web is becoming increasingly important. The previous studies adopt an Apriori-like candidate set generation approach but candidate set generation is still costly. We propose that extracting association rules from XML documents without any preprocessing or postprocessing using XML query language XQuery is possible and analyze the XQuery implementation of the efficient FP-tree based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. FP-tree based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets and a partition-based, divide-and-conquer method is used. Divide-and-conquer method divides the problem into a number of subproblems and the subproblems by solving them recursively. If the subproblem sizes are small enough, however, just solve the subproblems in a straightforward manner and then combine the solutions to the subproblems into the solution for the original problem. In addition, we suggest features that need to be added into XQuery in order to make the implementation of the FP growth more efficient
Keywords :
Internet; XML; data mining; divide and conquer methods; meta data; query languages; Apriori-like candidate set generation; FP-growth; FP-tree based mining method; Web; XML data; XML query language; XQuery; data mining; divide-and-conquer method; extensible Markup Language; semistructured data; Association rules; Data mining; Database languages; Pattern analysis; Standards development; XML; FP-growth method; XML data; XQuery;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Telecommunication Technologies, 2005. APSITT 2005 Proceedings. 6th Asia-Pacific Symposium on
Conference_Location :
Yangon
Print_ISBN :
4-88552-216-1
Type :
conf
DOI :
10.1109/APSITT.2005.203658
Filename :
1593465
Link To Document :
بازگشت