• DocumentCode
    3105054
  • Title

    A Divide-Conquer Strategy for Both English and Chinese Text Chunking

  • Author

    Liang, Ying-Hong ; Wang, Ni-Hong ; Qiu, Zhao-wen ; Chen, Yin- ; Zhao, Tie-jun

  • fYear
    2007
  • fDate
    22-24 Aug. 2007
  • Firstpage
    81
  • Lastpage
    86
  • Abstract
    The traditional English text chunking approach identifies phrases by using only one model and phrases with the same types of features. It has been shown that the limitations of using only one model are that: the use of the same types of features is not suitable for all phrases, and data sparseness may also result. In this paper, a divide-conquer strategy is proposed and applied in the identification of English phrases. And then, this strategy is rapid transplanted to Chinese text chunking. This strategy divides the task of chunking into several sub-tasks according to sensitive features of each phrase and identifies different phrases in parallel. Then, a two-stage decreasing conflict strategy is used to synthesize each sub-task´s answer, where the main features are: one, each phrase uses its own sensitive features; two, avoidance of data sparseness. Through testing on public corpus (English) and Chinese Penn Treebank (Chinese), F score of English chunking achieves to 95.14% and that of Chinese chunking is 95.23%. These results are state of the art with the best results that have been reported..
  • Keywords
    Data mining; Electronic mail; Forestry; Information technology; Laboratories; Learning systems; Natural language processing; Natural languages; Speech processing; Testing; text chunkindivide-conquer strategydata sparseness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on
  • Conference_Location
    Luoyang, Henan, China
  • Print_ISBN
    978-0-7695-2930-1
  • Type

    conf

  • DOI
    10.1109/ALPIT.2007.36
  • Filename
    4460619