• DocumentCode
    2539446
  • Title

    Automatic Extraction of Multiword Expressions Combining Statistical and Similarity Approaches

  • Author

    Xu, Jian ; Yu, Jingsong ; Wang, Huilin

  • Author_Institution
    Dept. of Language & Inf. Eng., Peking Univ., Beijing, China
  • fYear
    2010
  • fDate
    13-15 Dec. 2010
  • Firstpage
    256
  • Lastpage
    259
  • Abstract
    Multiword expressions (MWEs) are important for practical applications, such as machine translation (henceforth, MT), multilingual information retrieval, data mining and other natural language processing. A method of combining similarity measure and statistical tool is proposed for automatically extracting English MWEs from the corpus of Chinese government white papers and work reports from 1991 to 2010. Statistical approach is employed to calculate the co-occurrence affinity between two words. Besides, similarity measure is harnessed to compute the semantic relations between words for improving MWE coverage, thus aiming at obtaining higher precision and recall in extracting candidate multiword expressions. Experimental results showed the proposed technique improved MWE extraction efficiently.
  • Keywords
    information retrieval; natural languages; statistical analysis; word processing; Chinese government; English; automatic extraction; multiword expression; semantic relation; similarity approach; statistical approach; white paper; work report; Computational linguistics; Conferences; Data mining; Government; Natural language processing; Pragmatics; Semantics; co-occurrence affinity; multiword expressions; similarity approach; statistical tool;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Genetic and Evolutionary Computing (ICGEC), 2010 Fourth International Conference on
  • Conference_Location
    Shenzhen
  • Print_ISBN
    978-1-4244-8891-9
  • Electronic_ISBN
    978-0-7695-4281-2
  • Type

    conf

  • DOI
    10.1109/ICGEC.2010.70
  • Filename
    5715418