DocumentCode
2506911
Title
A heuristic approach for recognizing a document´s language used for the Internet search engine GETESS
Author
Dusterhoft, A. ; Gröticke, S.
Author_Institution
Dept. of Comput. Sci., Rostock Univ., Germany
fYear
2000
fDate
2000
Firstpage
133
Lastpage
137
Abstract
The authors illustrate how Internet documents can be automatically analyzed in order to identify the document´s language. This language knowledge is then used for the Internet search engine, GETESS. The aim of the language classification heuristics is to ensure that documents with the same content, but different languages (e.g. in German and English), will not simultaneously be presented to the user as search results. The GETESS search engine only provides the results in the language relevant to the user. Consequently, the search-result set is narrower and more appropriately fits the needs of the user
Keywords
Internet; document handling; information retrieval; linguistics; search engines; English; GETESS; German; Internet documents; Internet search engine; document language recognition; heuristic approach; language classification heuristics; language knowledge; search results; search-result set; user needs; Computer architecture; Computer graphics; Computer science; Databases; Information analysis; Knowledge representation; Natural languages; Ontologies; Search engines; Web and internet services;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications, 2000. Proceedings. 11th International Workshop on
Conference_Location
London
ISSN
1529-4188
Print_ISBN
0-7695-0680-1
Type
conf
DOI
10.1109/DEXA.2000.875016
Filename
875016
Link To Document