Title :
Extracting Structure of Web Site Based on Hyperlink Analysis
Author_Institution :
Sch. of Bus. Adm., South China Univ. of Technol., Guangzhou
Abstract :
Structure of a Web site usually reflects the implicit logical relationship among Web pages, and is widely applied to Web mining and Web information retrieval. However, it is difficult for machine to extract structure of a Web site automatically out of varied noise hyperlinks. This paper proposes an algorithm to extract the structure of a Web site automatically based on hyperlink analysis. The algorithm identifies and filters noise hyperlinks by patterns of Web pages these hyperlinks connected, instead of patterns of the hyperlinks. It promises better performances than previous approaches. The preliminary results show that the proposed algorithm has a great improvement on both precision and recall ratio.
Keywords :
Web sites; data mining; information analysis; information filters; information retrieval; Web information retrieval; Web mining; Web pages; Web site; hyperlink analysis; noise hyperlink filters; Algorithm design and analysis; Data mining; Humans; Information filtering; Information filters; Information retrieval; Partitioning algorithms; Pattern analysis; Web mining; Web pages;
Conference_Titel :
Wireless Communications, Networking and Mobile Computing, 2008. WiCOM '08. 4th International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-2107-7
Electronic_ISBN :
978-1-4244-2108-4
DOI :
10.1109/WiCom.2008.2538