DocumentCode
2184096
Title
Annotating text segments in documents for search
Author
Cheng, Pu-Jen ; Chiao, Hsin-Chen ; Pan, Yi-Cheng ; Chien, Lee-Feng
Author_Institution
Inst. of Inf. Sci., Acad. Sinica, Taiwan
fYear
2005
fDate
19-22 Sept. 2005
Firstpage
317
Lastpage
320
Abstract
It has been shown that annotating prominent text patterns contained in documents with appropriate types may benefit many applications. Most conventional tools for automatic text annotation extract named entities from texts and annotate them with information about persons, locations, dates and so on. However, this kind of entity type information is often short in length and is mostly limited to a small set of broader categories. In this paper, we try to remedy this problem by presenting an approach to extract global evidences from documents for improved named entity recognition. We also propose an unsupervised, generalized classification approach that collects training data from the Web automatically and classifies text patterns into more refined categories. Experimental results show the feasibility of the proposed approaches for search on the data of the NTCIR-2 information retrieval task.
Keywords
Internet; text analysis; NTCIR-2 information retrieval task; World Wide Web; automatic text annotation; named entity recognition; text pattern classification; text segment annotation; training data; unsupervised classification; Books; Data mining; Information management; Information retrieval; Information science; Infrared detectors; Noise robustness; Text categorization; Text recognition; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2415-X
Type
conf
DOI
10.1109/WI.2005.32
Filename
1517864
Link To Document