• DocumentCode
    245132
  • Title

    Document-Specific Keyphrase Extraction Using Sequential Patterns with Wildcards

  • Author

    Fei Xie ; Xindong Wu ; Xingquan Zhu

  • Author_Institution
    Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    1055
  • Lastpage
    1060
  • Abstract
    Finding good keyphrases for a document is beneficial for many applications, such as text summarization, browsing, and indexing. In this paper, we propose a sequential pattern mining based document-specific keyphrase extraction method. Our key innovation is to use wildcards (or gap constraints) to help extract sequential patterns, where the flexible wildcard constraints within a pattern can capture semantic relationships between words. To achieve this goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with wildcard and one-off conditions that allows important keyphrases to be captured during the mining process. For each extracted keyphrase candidate, we use some statistical pattern features to characterize it. A supervised learning classifier is trained to identify keyphrases from a test document. Comparisons on keyphrase benchmark datasets confirm that our document-specific keyphrase extraction method is effective in improving the quality of extracted keyphrases.
  • Keywords
    data mining; learning (artificial intelligence); pattern classification; statistical analysis; text analysis; document-specific keyphrase extraction; gap constraints; keyphrase benchmark datasets; keyphrases identification; mining process; semantic relationships; sequential dataset; sequential pattern mining; sequential patterns extraction; statistical pattern features; supervised learning classifier; wildcard constraints; Data mining; Databases; Educational institutions; Feature extraction; Microprogramming; Semantics; Time complexity; classification; keyphrase extraction; sequential pattern mining; wildcards;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.105
  • Filename
    7023446