• DocumentCode
    3632476
  • Title

    Automatic Term Recognition Based on Data-Mining Techniques

  • Author

    Dominika Srajerova;Oleg Kovarik;Václav Cvrcek

  • Author_Institution
    Fac. of Arts, Charles Univ., Prague, Czech Republic
  • Volume
    4
  • fYear
    2009
  • Firstpage
    453
  • Lastpage
    457
  • Abstract
    We present a new method for automatic term extraction which is based on training datasets created to build inductive models for term identi?cation. Existing approaches employ simple statistical and linguistic rules designed merely ad-hoc and are unable to utilize complex relations of linguistic units. In contrast to those approaches, our method does not require such manually ascribed rules of extraction. The data for our research is taken from the Czech National Corpus which is lemmatised and morphologically tagged. Statistical information (frequency, distribution etc.) is generated automatically and thus the only expert contribution needed is to label terms in the training dataset.The data mining software creates models that perform the extraction without any further human input. Additionally, feature ranking can serve as valuable aid for understanding of the extraction process and its future development and in terminology research.
  • Keywords
    "Data mining","Terminology","Equations","Transfer functions","Computer science","Data engineering","Art","Frequency","Software performance","Humans"
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Engineering, 2009 WRI World Congress on
  • Print_ISBN
    978-0-7695-3507-4
  • Type

    conf

  • DOI
    10.1109/CSIE.2009.935
  • Filename
    5171037