• DocumentCode
    464195
  • Title

    Extracting Significant Phrases from Text

  • Author

    Lui, Yuan J. ; Brent, Richard ; Calinescu, Ani

  • Author_Institution
    Univ. of Oxford, Oxford
  • Volume
    1
  • fYear
    2007
  • fDate
    21-23 May 2007
  • Firstpage
    361
  • Lastpage
    366
  • Abstract
    Prospective readers can quickly determine whether a document is relevant to their information need if the significant phrases (or keyphrases) in this document are provided. Although keyphrases are useful, not many documents have keyphrases assigned to them, and manually assigning keyphrases to existing documents is costly. Therefore, there is a need for automatic keyphrase extraction. This paper introduces a new domain independent keyphrase extraction algorithm. The algorithm approaches the problem of keyphrase extraction as a classification task, and uses a combination of statistical and computational linguistics techniques, a new set of attributes, and a new learning method to distinguish keyphrases from non-keyphrases. The experiments indicate that this algorithm performs at least as well as other keyphrase extraction tools and that it significantly outperforms Microsoft Word 2000´s AutoSummarize feature.
  • Keywords
    computational linguistics; document handling; feature extraction; statistical analysis; Microsoft Word 2000 AutoSummarize feature; automatic keyphrase extraction; classification task; computational linguistics techniques; learning method; statistical techniques; Computational linguistics; Data mining; Frequency; Genetic algorithms; Internet; Learning systems; Machine learning; Machine learning algorithms; Training data; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications Workshops, 2007, AINAW '07. 21st International Conference on
  • Conference_Location
    Niagara Falls, Ont.
  • Print_ISBN
    978-0-7695-2847-2
  • Type

    conf

  • DOI
    10.1109/AINAW.2007.180
  • Filename
    4221086