مرکز منطقه ای اطلاع رساني علوم و فناوري - Designing and Implementing of the Webpage Information Extracting Model Based on Tags

DocumentCode :

2949729

Title :

Designing and Implementing of the Webpage Information Extracting Model Based on Tags

Author :

Xu, Zhang ; Yan, Dong

Author_Institution :

Dept. of Inf., Peking Union Univ., Beijing, China

fYear :

2011

fDate :

20-21 Aug. 2011

Firstpage :

273

Lastpage :

275

Abstract :

In this article, a novel model of Webpage information extraction based on tags is presented. With the ingenious algorithm, the model preformed better than Html Parser and Jsoup in most cases. It can be a URL filter of the Net Crawler in order to enhance efficiency.

Keywords :

Web sites; information retrieval; search engines; URL filter; Web page information extracting model; net crawler; tags; Context; Data mining; HTML; Law; Search engines; Web pages; Html Parser; Html Tag; Jsoup; Webpage information extraction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Intelligence Science and Information Engineering (ISIE), 2011 International Conference on

Conference_Location :

Wuhan

Print_ISBN :

978-1-4577-0960-9

Electronic_ISBN :

978-0-7695-4480-9

Type :

conf

DOI :

10.1109/ISIE.2011.71

Filename :

5997433

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2949729