• DocumentCode
    2184096
  • Title

    Annotating text segments in documents for search

  • Author

    Cheng, Pu-Jen ; Chiao, Hsin-Chen ; Pan, Yi-Cheng ; Chien, Lee-Feng

  • Author_Institution
    Inst. of Inf. Sci., Acad. Sinica, Taiwan
  • fYear
    2005
  • fDate
    19-22 Sept. 2005
  • Firstpage
    317
  • Lastpage
    320
  • Abstract
    It has been shown that annotating prominent text patterns contained in documents with appropriate types may benefit many applications. Most conventional tools for automatic text annotation extract named entities from texts and annotate them with information about persons, locations, dates and so on. However, this kind of entity type information is often short in length and is mostly limited to a small set of broader categories. In this paper, we try to remedy this problem by presenting an approach to extract global evidences from documents for improved named entity recognition. We also propose an unsupervised, generalized classification approach that collects training data from the Web automatically and classifies text patterns into more refined categories. Experimental results show the feasibility of the proposed approaches for search on the data of the NTCIR-2 information retrieval task.
  • Keywords
    Internet; text analysis; NTCIR-2 information retrieval task; World Wide Web; automatic text annotation; named entity recognition; text pattern classification; text segment annotation; training data; unsupervised classification; Books; Data mining; Information management; Information retrieval; Information science; Infrared detectors; Noise robustness; Text categorization; Text recognition; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2415-X
  • Type

    conf

  • DOI
    10.1109/WI.2005.32
  • Filename
    1517864