DocumentCode
2000070
Title
Anatomy of a News Archive and Search Engine (Optimized for Persian Web)
Author
Khalifehsoltani, Sayed Nasir ; Vahdani, Ali ; Moallemi, Reza
Author_Institution
Dept. of Comput. Eng., SheikhBahaee Univ., Isfahan
fYear
2009
fDate
27-29 April 2009
Firstpage
1361
Lastpage
1366
Abstract
News search engines are a class of search engines which professionally monitor the web news. These engines usually provide their contents through extraction of news feeds. But news feeds are not fully supported by all news sources, especially the Persian ones. Another way is indexing the content of news pages where the results are less adequately accurate due to the misrecognition of news structure. In this article we offer the architecture of a news search engine which extracts, archives structured news content and then performs complementary processes such as indexing and classifying of news which has been optimized for Persian language. Using the structured text of news, we reached higher precision in complementary processes.
Keywords
indexing; information retrieval; search engines; Persian Web; Persian language; Web news; news archive anatomy; news feeds extraction; news pages content indexing; news search engines; news structure misrecognition; Anatomy; Computerized monitoring; Cost accounting; Data mining; Feeds; Indexing; Information technology; Resource description framework; Search engines; XML; Automatic News Classifying; Information Extraction; News Archiving; News Search Engine; Text Indexing; Web Page Processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology: New Generations, 2009. ITNG '09. Sixth International Conference on
Conference_Location
Las Vegas, NV
Print_ISBN
978-1-4244-3770-2
Electronic_ISBN
978-0-7695-3596-8
Type
conf
DOI
10.1109/ITNG.2009.264
Filename
5070816
Link To Document