• DocumentCode
    1637530
  • Title

    Suffix Tree Based Approach for Chinese Information Retrieval

  • Author

    Huang, Jin Hu ; Powers, David

  • Author_Institution
    Sch. of Comput. Sci., Flinders Univ. of South Australia, SA
  • Volume
    3
  • fYear
    2008
  • Firstpage
    393
  • Lastpage
    397
  • Abstract
    With the widespread of the Internet, great research interests are being shown in Chinese language information retrieval in recent years. The absence of word boundaries in Chinese language makes Chinese information retrieval (IR) different to European IR. In order to apply traditional IR approaches to Chinese language, sentences have to be segmented into words first. Word segmentation is playing a key role in Chinese IR. As word segmentation is not straightforward and the results are sometime ambiguous, n-grams are used as an alternative. Several experimental studies have been conducted to compare words and n-grams, word segmentation and its effect on information retrieval. These studies show that using either words or n-grams leads to comparable performances. Higher word segmentation accuracy does not necessarily result in better retrieval performance. In this paper we propose a suffix tree based approach for Chinese information retrieval without word segmentation.
  • Keywords
    Internet; information retrieval; natural language processing; Chinese language information retrieval; Internet; n-grams; suffix tree; Application software; Computer science; Design engineering; Frequency; Indexing; Information retrieval; Intelligent systems; Internet; Natural languages; Power engineering and energy; Suffix Tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on
  • Conference_Location
    Kaohsiung
  • Print_ISBN
    978-0-7695-3382-7
  • Type

    conf

  • DOI
    10.1109/ISDA.2008.365
  • Filename
    4696497