• DocumentCode
    2836084
  • Title

    Auto-Identifying Terms Based on a Place-Extending Method

  • Author

    Zezhi Zheng

  • Author_Institution
    Dept. of Chinese Language & Literature, Xiamen Univ., Xiamen, China
  • fYear
    2011
  • fDate
    17-18 July 2011
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    The normalized relative frequency ratio is used as the domain differential degree to estimate the domain feature of a string; the sequence correlation coefficient is used to judge the stability of a string. The identifying process takes two steps. 1) Get term seeds. Extract adjacent character pairs from the domain corpus and the general corpus respectively. Then obtain term seeds by sifting the adjacency pairs with the domain differential degree, mutual information and the taboo character list jointly; 2) Gain terms. With strategy of verbatim extending, take the term seeds as anchor points. Then extend each seeds to its both sides verbatim. Leach every spread character with the sequence correlation coefficients, exceptional-correct rules and the taboo word list in turn. Take the terms with the character, as an example. The test showed that the precision and the recall rate of the algorithm reached 86.73% and 85.91%, respectively.
  • Keywords
    character recognition; correlation methods; feature extraction; sequences; string matching; domain differential degree; place extending method; sequence correlation coefficient; string stability; taboo word list; Correlation; Data mining; Feature extraction; Mutual information; Physics; Time frequency analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits, Communications and System (PACCS), 2011 Third Pacific-Asia Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4577-0855-8
  • Type

    conf

  • DOI
    10.1109/PACCS.2011.5990133
  • Filename
    5990133