• DocumentCode
    2260437
  • Title

    Web mining based on VIPS in intention-based information retrieval

  • Author

    Zhang, Qiang ; Jiang, Xiaoxiao ; Sun, Jiashen

  • Author_Institution
    Beijing Univ. of Posts & Telecommun., Beijing, China
  • fYear
    2009
  • fDate
    24-27 Sept. 2009
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    This paper introduces a VIPS (Vision-based Page Segmentation) based Web mining method which aims to user intents based retrieval. It firstly grasps information from Web by making use of large search engines such as Baidu and so on, and then clusters the web pages basing on the intention-related features of Web text. The main algorithm is described in detail and experiments are designed to grasp the query in Chinese from Baidu and Ask search engines. The results prove that the VIPS based method can achieve significant improvement comparing with some previous work.
  • Keywords
    Internet; data mining; information retrieval; pattern clustering; search engines; text analysis; visual perception; Baidu-Ask search engine; Web page clustering; Web text mining; intention-based information retrieval; vision-based page segmentation; Clustering algorithms; Data mining; HTML; Information retrieval; Search engines; Sun; Tree data structures; Uniform resource locators; Web mining; Web pages; HTML structure; VIPS; information retrieval; web mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-4538-7
  • Electronic_ISBN
    978-1-4244-4540-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2009.5313791
  • Filename
    5313791