مرکز منطقه ای اطلاع رساني علوم و فناوري - Web Page Content Extraction Method Based on Link Density and Statistic

DocumentCode :

479068

Title :

Web Page Content Extraction Method Based on Link Density and Statistic

Author :

Pan, Donghua ; Qiu, Shaogang ; Yin, Dawei

Author_Institution :

Inst. of Syst. Eng., Dalian Univ. of Technol., Dalian

fYear :

2008

fDate :

12-14 Oct. 2008

Firstpage :

Lastpage :

Abstract :

Web page content extraction is a key step for knowledge acquisition from the Internet. The physical layout of Web pages is always composed of useful information, advertising links and images. So how to extract the right content and filter out irrelevant information is an important work. According to the different properties between content nodes and non-content nodes of Web page represented as a tree, an algorithm based on link density and statistic is presented. This method increases the veracity of content extraction which will benefit the efficiency of information acquirement for corporations and organizations. The work of this paper is important for knowledge acquisition.

Keywords :

Internet; content management; information filtering; Internet; Web page; content extraction; information acquirement; information filter; knowledge acquisition; link density; statistic; Advertising; Content based retrieval; Data mining; Information filtering; Internet; Knowledge acquisition; Knowledge engineering; Statistics; Systems engineering and theory; Web pages;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Wireless Communications, Networking and Mobile Computing, 2008. WiCOM '08. 4th International Conference on

Conference_Location :

Dalian

Print_ISBN :

978-1-4244-2107-7

Electronic_ISBN :

978-1-4244-2108-4

Type :

conf

DOI :

10.1109/WiCom.2008.2664

Filename :

4680853

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=479068