• DocumentCode
    498918
  • Title

    A new web information extracting method based on multi-coordinate

  • Author

    Huang, Min ; Xi, Jian-qing ; Sun, Bo

  • Author_Institution
    Sch. of Software Eng., South China Univ. of Technol., Guangzhou, China
  • Volume
    3
  • fYear
    2009
  • fDate
    12-15 July 2009
  • Firstpage
    1488
  • Lastpage
    1492
  • Abstract
    To sovle the problems of lower accuracy and higher re-build workload caused by single path-based data locating methods in the traditional Web information extracting, a new method called multi-coordinate locating of information extracting has been presented in the paper, which constructs three different coordinate systems such as global coordinate, local coordinate and random coordinate to locate the information in HTML page. And it is able to re-locate the path of data by the self-restoring of wrapper based on multi-coordinate systems when the HTML document changes. Each of the three coordinate locating methods has been described in detail. By developing a prototype system and doing some experiments, it can be proved from the results that the multi-coordinate method can improve the tolerance to Web page´s changes of the wrapper without adding extra costs and decreasing its performance.
  • Keywords
    Internet; information retrieval; HTML page; Web information extracting method; Web page; multicoordinate locating method; single path-based data locating method; Conference management; Cybernetics; Data mining; Engineering management; HTML; Machine learning; Paper technology; Software engineering; Technology management; Web pages; Coordinate; Information extracting; Self-healing; Web; Web wrapper;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2009 International Conference on
  • Conference_Location
    Baoding
  • Print_ISBN
    978-1-4244-3702-3
  • Electronic_ISBN
    978-1-4244-3703-0
  • Type

    conf

  • DOI
    10.1109/ICMLC.2009.5212311
  • Filename
    5212311