• DocumentCode
    3105073
  • Title

    A Heuristic Approach for Segmentation Granularity Problem in Chinese Information Retrieval

  • Author

    Fan, Ding ; Bin, Wang ; Sili, Wang

  • fYear
    2007
  • fDate
    22-24 Aug. 2007
  • Firstpage
    87
  • Lastpage
    91
  • Abstract
    In Chinese information retrieval, documents are usually segmented into words and then indexed by these words. However, segmentation granularity problem (SDP) should be considered because small granularity may lead to low precision and efficiency while big granularity may cause low recall. To solve the problem, this paper proposes an intuitive and heuristic approach. Two-level index for the segmentation dictionary is built by which the original query word could be expanded with its weighted overlaid words. This method not only reserves the advantage of big granularity in precision, but also overcome its disadvantage in recall. The experimental results show that our approach slightly but consistently outperforms the baseline.
  • Keywords
    Computers; Dictionaries; Frequency; Indexing; Information retrieval; Information technology; Large-scale systems; Natural languages; Particle separators;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on
  • Conference_Location
    Luoyang, Henan, China
  • Print_ISBN
    978-0-7695-2930-1
  • Type

    conf

  • DOI
    10.1109/ALPIT.2007.46
  • Filename
    4460620