• DocumentCode
    3680235
  • Title

    A Hierachical Collocation Extraction Tool

  • Author

    Dan Li;Jiangxiang Cao;Degen Huang

  • Author_Institution
    Sch. of Foreign Languages, Dalian Univ. of Technol., Dalian, China
  • fYear
    2015
  • Firstpage
    51
  • Lastpage
    55
  • Abstract
    We design a hierarchical collocation extraction tool according to the three-layered linguistic properties of collocation. Based on the structured definitions of collocation, the extraction goes through three phases: i) extracting peripheral collocations in the frequency layer from dependency triples, ii) extracting semi-peripheral collocations in the syntactic layer by association measures (AMs), iii) extracting core collocations in the semantic layer with a similar word thesaurus. The thesaurus is created by taking all the collocations of a word as its features and computing the similarity between any two words. Experiments on our test corpus of China English with Oxford Collocations Dictionary as the gold standard show that the integrated measure (LMP) we propose outperforms the other 3 AMs. The syntactic constraints in Phase-II filter out much noise from surface co-occurrences, the semantic constraints at Phase-III are effective in identifying the very "core" collocations, and the keyness of the words on the test set is a significant factor when a published collocation dictionary is taken as the gold standard. The tool can be a convenient aid for linguists and language teachers and learners.
  • Keywords
    "Syntactics","Pragmatics","Semantics","Standards","Gold","Natural language processing","Computational linguistics"
  • Publisher
    ieee
  • Conference_Titel
    Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on
  • Type

    conf

  • DOI
    10.1109/BDCloud.2015.67
  • Filename
    7310715