• DocumentCode
    2927009
  • Title

    An XML subtree segmentation method based on syntactic segmentation rate

  • Author

    Liang, Wenxin ; Ouyang, Xiangyong ; Yokota, Haruo

  • Author_Institution
    Japan Science and Technology Agency, Tokyo Institute of Technology, Japan
  • Volume
    2
  • fYear
    2007
  • fDate
    28-31 Oct. 2007
  • Firstpage
    551
  • Lastpage
    558
  • Abstract
    In this paper, we propose an effective method for segmenting large XML documents into independent meaningful subtrees based on two syntactic segmentation rates: vertical segmentation rate and horizontal segmentation rate. In the proposed method, we use DO-VLEI code to calculate the required parameters for the subtree segmentation. We conduct experiments to observe the effectiveness of the proposed subtree segmentation method using real bibliography XML documents stored in RDBs. We apply our previously proposed subtree matching algorithm SLAX to match the segmented subtrees and evaluate how the matching threshold impacts the precision and recall of subtree matching. Besides, we also integrate the matched subtrees determined by SLAX by our previously proposed subtree integration algorithm. The experimental results indicate that the proposed subtree segmentation method is effective for segmenting XML documents into independent meaningful subtrees and our previously proposed subtree matching algorithm achieves reasonable matching precision and recall using the segmented subtrees.
  • Keywords
    Bibliographies; Data preprocessing; Internet; Labeling; Large-scale systems; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Information Management, 2007. ICDIM '07. 2nd International Conference on
  • Conference_Location
    Lyon, France
  • Print_ISBN
    978-1-4244-1475-8
  • Electronic_ISBN
    978-1-4244-1476-5
  • Type

    conf

  • DOI
    10.1109/ICDIM.2007.4444281
  • Filename
    4444281