• DocumentCode
    479777
  • Title

    Automatic Segmentation of Hierarchy Feature without Lexicon for Chinese Text Based on Iterative Learning

  • Author

    Jiang, Shaohua ; Dang, Yanzhong

  • Author_Institution
    Sch. of Civil & Hydraulic Eng., Dalian Univ. of Technol., Dalian
  • Volume
    1
  • fYear
    2008
  • fDate
    12-14 Dec. 2008
  • Firstpage
    657
  • Lastpage
    661
  • Abstract
    Chinese features extraction is indispensable in a processing of Chinese natural language because it is beneficial to Chinese text knowledge discovery and information retrieval. Chinese Segmentation is the precondition of features extraction. To conquer the disadvantage of current Chinese segmentation methods, such as lexicon-based scheme, syntax and rules-based scheme, statistics-based scheme and the integration method of the above scheme, the maximum matching and frequency statistics (MMFS) segmentation method based on length descending and string frequency statistics was put forward. To extract shorter words and phrases included in longer ones, a novel Chinese hierarchy feature extraction method based on MMFS and iterative learning was proposed. This method can obtain hierarchy feature according to morphology with no lexicon, no acquiring the probability between words in advance and no Chinese character index. Experimental results confirmed the efficiency of this statistical method in extracting Chinese hierarchy feature.
  • Keywords
    feature extraction; information retrieval; natural language processing; text analysis; Chinese character index; Chinese hierarchy feature extraction; Chinese natural language; Chinese segmentation method; Chinese text knowledge discovery; automatic segmentation; frequency statistics segmentation; information retrieval; iterative learning; lexicon-based scheme; maximum matching; rules-based scheme; statistics-based scheme; string frequency statistics; syntax; Computer science; Feature extraction; Frequency; Information retrieval; Natural languages; Probability; Software engineering; Statistics; Testing; Text mining; Chinese text; automatic segmentation; hierarchy feature; iterative learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering, 2008 International Conference on
  • Conference_Location
    Wuhan, Hubei
  • Print_ISBN
    978-0-7695-3336-0
  • Type

    conf

  • DOI
    10.1109/CSSE.2008.1434
  • Filename
    4721835