• DocumentCode
    1749719
  • Title

    Bootstrap method for Chinese new words extraction

  • Author

    He, Shan ; Zhu, Jie

  • Author_Institution
    Dept. of Electr. Eng., Shanghai Jiaotong Univ., China
  • Volume
    1
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    581
  • Abstract
    A bootstrap approach for extracting unknown words from a Chinese text corpus is proposed. Instead of using a non-iterative segmentation-detection approach, the proposed method iteratively extracts the new words and adds them into the lexicon. Then the augmented dictionary, which includes potential unknown words (in addition to known words), is used in the next iteration to re-segment the input corpus until stop conditions are reached. Experiments show that both the precision and recall rates of segmentation are improved
  • Keywords
    entropy; iterative methods; probability; text analysis; Chinese new words extraction; Chinese text corpus; augmented dictionary; bootstrap method; heuristic rules; input corpus; iterative methods; potential unknown words; precision; recall rates; stop conditions; Dictionaries; Feathers; Frequency; Helium; Impedance matching; Information retrieval; Natural language processing; Natural languages; Text processing; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on
  • Conference_Location
    Salt Lake City, UT
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7041-4
  • Type

    conf

  • DOI
    10.1109/ICASSP.2001.940898
  • Filename
    940898