• DocumentCode
    2896736
  • Title

    A Divide-Conquer Strategy for English Text Chunking

  • Author

    Liang, Ying-Hong ; Wang, Ni-Hong ; Su, Jian-min ; Ren, Hong-e

  • Author_Institution
    Sch. of Inf. & Comput. Eng., North East Forestry Univ., Harbin
  • fYear
    2006
  • fDate
    13-16 Aug. 2006
  • Firstpage
    3370
  • Lastpage
    3375
  • Abstract
    The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not suitable for all phrases, and data sparseness may also result. In this paper, the divide-conquer approach is proposed and applied in the identification of English phrases. This strategy divides the task of chunking into several sub-tasks according to sensitive features of each phrase and identifies different phrases in parallel. Then, a two-stage decreasing conflict strategy is used to synthesize each sub-task´s answer. By applying and testing the approach on the public training and test corpus, the F score for arbitrary phrases identification using divide-conquer strategy achieves 94.14% compared to the previous best F score of 94.17%
  • Keywords
    divide and conquer methods; feature extraction; grammars; learning (artificial intelligence); natural languages; text analysis; English phrase identification; English text chunking approach; data sparseness; divide-conquer strategy; phrase feature identification; shallow parsing method; two-stage decreasing conflict strategy; Cybernetics; Data mining; Electronic mail; Forestry; Information analysis; Laboratories; Learning systems; Machine learning; Natural language processing; Speech processing; Testing; Text processing; Text chunking; divide-conquer strategy; sensitive features;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2006 International Conference on
  • Conference_Location
    Dalian, China
  • Print_ISBN
    1-4244-0061-9
  • Type

    conf

  • DOI
    10.1109/ICMLC.2006.258477
  • Filename
    4028650