مرکز منطقه ای اطلاع رساني علوم و فناوري - Tokenization and proper noun recognition for information retrieval

DocumentCode :

2430140

Title :

Tokenization and proper noun recognition for information retrieval

Author :

Barcala, Fco Mario ; Vilares, Jesús ; Alonso, Miguel A. ; Grana, J. ; Vilares, Manuel

Author_Institution :

Departamento de Computacion, Univ. da Coruna, La Coruna, Spain

fYear :

2002

fDate :

2-6 Sept. 2002

Firstpage :

246

Lastpage :

250

Abstract :

In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for pre-tagging tasks such as proper noun recognition. We also show the results of several experiments performed in order to study the impact of the strategy chosen for the recognition of proper nouns.

Keywords :

data mining; natural languages; text analysis; advanced tokenizer; complex linguistic phenomena; natural language processing techniques; proper noun recognition; tokenization; Employment; Filters; Indexing; Information analysis; Information retrieval; Natural language processing; Natural languages; Performance analysis; Proposals; Text recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Database and Expert Systems Applications, 2002. Proceedings. 13th International Workshop on

ISSN :

1529-4188

Print_ISBN :

0-7695-1668-8

Type :

conf

DOI :

10.1109/DEXA.2002.1045906

Filename :

1045906

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2430140