• DocumentCode
    2000070
  • Title

    Anatomy of a News Archive and Search Engine (Optimized for Persian Web)

  • Author

    Khalifehsoltani, Sayed Nasir ; Vahdani, Ali ; Moallemi, Reza

  • Author_Institution
    Dept. of Comput. Eng., SheikhBahaee Univ., Isfahan
  • fYear
    2009
  • fDate
    27-29 April 2009
  • Firstpage
    1361
  • Lastpage
    1366
  • Abstract
    News search engines are a class of search engines which professionally monitor the web news. These engines usually provide their contents through extraction of news feeds. But news feeds are not fully supported by all news sources, especially the Persian ones. Another way is indexing the content of news pages where the results are less adequately accurate due to the misrecognition of news structure. In this article we offer the architecture of a news search engine which extracts, archives structured news content and then performs complementary processes such as indexing and classifying of news which has been optimized for Persian language. Using the structured text of news, we reached higher precision in complementary processes.
  • Keywords
    indexing; information retrieval; search engines; Persian Web; Persian language; Web news; news archive anatomy; news feeds extraction; news pages content indexing; news search engines; news structure misrecognition; Anatomy; Computerized monitoring; Cost accounting; Data mining; Feeds; Indexing; Information technology; Resource description framework; Search engines; XML; Automatic News Classifying; Information Extraction; News Archiving; News Search Engine; Text Indexing; Web Page Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology: New Generations, 2009. ITNG '09. Sixth International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4244-3770-2
  • Electronic_ISBN
    978-0-7695-3596-8
  • Type

    conf

  • DOI
    10.1109/ITNG.2009.264
  • Filename
    5070816