مرکز منطقه ای اطلاع رساني علوم و فناوري - Preparing Text Reports from Web Pages Employing Similarity Tests

DocumentCode :

651537

Title :

Preparing Text Reports from Web Pages Employing Similarity Tests

Author :

Guadalupe Ramos, J. ; Solorio, Juan C. ; Campoy, Lourdes ; Ruiz, Santiago ; Jasso, Nicolas

Author_Institution :

Inst. Tecnol. de La Piedad, La Piedad, Mexico

fYear :

2013

fDate :

Oct. 30 2013-Nov. 1 2013

Firstpage :

Lastpage :

Abstract :

The World Wide Web is the main source of information for many organizations and common users. However, the analysis and selection of the web content is still an arduous manual task in many cases. When a web query is sent towards a web search engine, a list of URLs is received, frequently ordered by popularity (such as Google´s PageRank algorithm). Then, the user must read and analyze each URL in order to find out the convenient information. In this work a method that automatically constructs a text report induced by a web query from a set of URLs is presented. The method extracts text slices (excerpts) from web pages considering the most similar text w.r.t. a web query as slicing criterion. A slice is composed by document object model (DOM) nodes, whereas similarity is calculated using standard techniques employed in natural language processing.

Keywords :

Internet; natural language processing; query processing; search engines; text analysis; DOM nodes; URL; Web content analysis; Web content selection; Web pages; Web query; Web search engine; World Wide Web; document object model nodes; excerpts extraction; natural language processing; similarity tests; slicing criterion; text reports preparation; text slices extraction; Google; Natural language processing; Software tools; Standards; Vectors; Web pages; information retrieval; similarity; slicing; summarization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Science (ENC), 2013 Mexican International Conference on

Conference_Location :

Morelia

ISSN :

1550-4069

Type :

conf

DOI :

10.1109/ENC.2013.8

Filename :

6679814

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=651537