DocumentCode
2260437
Title
Web mining based on VIPS in intention-based information retrieval
Author
Zhang, Qiang ; Jiang, Xiaoxiao ; Sun, Jiashen
Author_Institution
Beijing Univ. of Posts & Telecommun., Beijing, China
fYear
2009
fDate
24-27 Sept. 2009
Firstpage
1
Lastpage
5
Abstract
This paper introduces a VIPS (Vision-based Page Segmentation) based Web mining method which aims to user intents based retrieval. It firstly grasps information from Web by making use of large search engines such as Baidu and so on, and then clusters the web pages basing on the intention-related features of Web text. The main algorithm is described in detail and experiments are designed to grasp the query in Chinese from Baidu and Ask search engines. The results prove that the VIPS based method can achieve significant improvement comparing with some previous work.
Keywords
Internet; data mining; information retrieval; pattern clustering; search engines; text analysis; visual perception; Baidu-Ask search engine; Web page clustering; Web text mining; intention-based information retrieval; vision-based page segmentation; Clustering algorithms; Data mining; HTML; Information retrieval; Search engines; Sun; Tree data structures; Uniform resource locators; Web mining; Web pages; HTML structure; VIPS; information retrieval; web mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
Conference_Location
Dalian
Print_ISBN
978-1-4244-4538-7
Electronic_ISBN
978-1-4244-4540-0
Type
conf
DOI
10.1109/NLPKE.2009.5313791
Filename
5313791
Link To Document