Annotating text segments in documents for search

Author

Cheng, Pu-Jen ; Chiao, Hsin-Chen ; Pan, Yi-Cheng ; Chien, Lee-Feng

Author_Institution

Inst. of Inf. Sci., Acad. Sinica, Taiwan

fYear

2005

fDate

19-22 Sept. 2005

Firstpage

317

Lastpage

320

Abstract

It has been shown that annotating prominent text patterns contained in documents with appropriate types may benefit many applications. Most conventional tools for automatic text annotation extract named entities from texts and annotate them with information about persons, locations, dates and so on. However, this kind of entity type information is often short in length and is mostly limited to a small set of broader categories. In this paper, we try to remedy this problem by presenting an approach to extract global evidences from documents for improved named entity recognition. We also propose an unsupervised, generalized classification approach that collects training data from the Web automatically and classifies text patterns into more refined categories. Experimental results show the feasibility of the proposed approaches for search on the data of the NTCIR-2 information retrieval task.

Keywords

Internet; text analysis; NTCIR-2 information retrieval task; World Wide Web; automatic text annotation; named entity recognition; text pattern classification; text segment annotation; training data; unsupervised classification; Books; Data mining; Information management; Information retrieval; Information science; Infrared detectors; Noise robustness; Text categorization; Text recognition; Training data;

fLanguage

English

Publisher

ieee

Conference_Titel

Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on

Print_ISBN

0-7695-2415-X

Type

conf

DOI

10.1109/WI.2005.32

Filename

1517864