• DocumentCode
    584571
  • Title

    Simultaneous Product Attribute Name and Value Extraction with Adaptively Learnt Templates

  • Author

    Wei Tang ; Yu Hong ; Yan-Hui Feng ; Jian-Min Yao ; Qiao-Ming Zhu

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
  • fYear
    2012
  • fDate
    11-13 Aug. 2012
  • Firstpage
    2021
  • Lastpage
    2025
  • Abstract
    If we present the products as the attribute name and value pairs, it will improve the effectiveness of many applications. In this paper, we propose an adaptive template based method to simultaneously extract the product attribute name and value pair from Web pages. The titles of Web pages are used to assist the unsupervised template construction. And the template ranking strategy ensures the correct templates of every Web page are selected. Our approach contains four key steps: 1) construct domain attribute word bag by the titles of Web pages. 2) segment text nodes based on some default delimiters. 3) collect candidate attribute and value pairs 4) learn high-quality templates by a template ranking algorithm. The experimental corpus is collected from two domains: digital camera and mobile phone. Experiments show the precision of 94.68% and recall of 90.57% can be got by our method.
  • Keywords
    Internet; data mining; information retrieval; retail data processing; text analysis; unsupervised learning; Web data mining; Web pages; World Wide Web; adaptive template based method; adaptively learnt templates; candidate attribute collection; digital camera; domain attribute word bag construction; high-quality template learning; mobile phone; online shops; simultaneous product attribute name extraction; template ranking strategy; text node segmentation; unsupervised template construction; value pair collection; value pair extraction; value pairs; Data mining; Digital cameras; HTML; Mobile handsets; Ontologies; Web pages; Web data mining; product attribute name and value pair; template construction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science & Service System (CSSS), 2012 International Conference on
  • Conference_Location
    Nanjing
  • Print_ISBN
    978-1-4673-0721-5
  • Type

    conf

  • DOI
    10.1109/CSSS.2012.503
  • Filename
    6394821