DocumentCode
2927009
Title
An XML subtree segmentation method based on syntactic segmentation rate
Author
Liang, Wenxin ; Ouyang, Xiangyong ; Yokota, Haruo
Author_Institution
Japan Science and Technology Agency, Tokyo Institute of Technology, Japan
Volume
2
fYear
2007
fDate
28-31 Oct. 2007
Firstpage
551
Lastpage
558
Abstract
In this paper, we propose an effective method for segmenting large XML documents into independent meaningful subtrees based on two syntactic segmentation rates: vertical segmentation rate and horizontal segmentation rate. In the proposed method, we use DO-VLEI code to calculate the required parameters for the subtree segmentation. We conduct experiments to observe the effectiveness of the proposed subtree segmentation method using real bibliography XML documents stored in RDBs. We apply our previously proposed subtree matching algorithm SLAX to match the segmented subtrees and evaluate how the matching threshold impacts the precision and recall of subtree matching. Besides, we also integrate the matched subtrees determined by SLAX by our previously proposed subtree integration algorithm. The experimental results indicate that the proposed subtree segmentation method is effective for segmenting XML documents into independent meaningful subtrees and our previously proposed subtree matching algorithm achieves reasonable matching precision and recall using the segmented subtrees.
Keywords
Bibliographies; Data preprocessing; Internet; Labeling; Large-scale systems; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Digital Information Management, 2007. ICDIM '07. 2nd International Conference on
Conference_Location
Lyon, France
Print_ISBN
978-1-4244-1475-8
Electronic_ISBN
978-1-4244-1476-5
Type
conf
DOI
10.1109/ICDIM.2007.4444281
Filename
4444281
Link To Document