• DocumentCode
    2506911
  • Title

    A heuristic approach for recognizing a document´s language used for the Internet search engine GETESS

  • Author

    Dusterhoft, A. ; Gröticke, S.

  • Author_Institution
    Dept. of Comput. Sci., Rostock Univ., Germany
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    133
  • Lastpage
    137
  • Abstract
    The authors illustrate how Internet documents can be automatically analyzed in order to identify the document´s language. This language knowledge is then used for the Internet search engine, GETESS. The aim of the language classification heuristics is to ensure that documents with the same content, but different languages (e.g. in German and English), will not simultaneously be presented to the user as search results. The GETESS search engine only provides the results in the language relevant to the user. Consequently, the search-result set is narrower and more appropriately fits the needs of the user
  • Keywords
    Internet; document handling; information retrieval; linguistics; search engines; English; GETESS; German; Internet documents; Internet search engine; document language recognition; heuristic approach; language classification heuristics; language knowledge; search results; search-result set; user needs; Computer architecture; Computer graphics; Computer science; Databases; Information analysis; Knowledge representation; Natural languages; Ontologies; Search engines; Web and internet services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications, 2000. Proceedings. 11th International Workshop on
  • Conference_Location
    London
  • ISSN
    1529-4188
  • Print_ISBN
    0-7695-0680-1
  • Type

    conf

  • DOI
    10.1109/DEXA.2000.875016
  • Filename
    875016