Title : 
Document-Specific Keyphrase Extraction Using Sequential Patterns with Wildcards
         
        
            Author : 
Fei Xie ; Xindong Wu ; Xingquan Zhu
         
        
            Author_Institution : 
Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
         
        
        
        
        
        
            Abstract : 
Finding good keyphrases for a document is beneficial for many applications, such as text summarization, browsing, and indexing. In this paper, we propose a sequential pattern mining based document-specific keyphrase extraction method. Our key innovation is to use wildcards (or gap constraints) to help extract sequential patterns, where the flexible wildcard constraints within a pattern can capture semantic relationships between words. To achieve this goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with wildcard and one-off conditions that allows important keyphrases to be captured during the mining process. For each extracted keyphrase candidate, we use some statistical pattern features to characterize it. A supervised learning classifier is trained to identify keyphrases from a test document. Comparisons on keyphrase benchmark datasets confirm that our document-specific keyphrase extraction method is effective in improving the quality of extracted keyphrases.
         
        
            Keywords : 
data mining; learning (artificial intelligence); pattern classification; statistical analysis; text analysis; document-specific keyphrase extraction; gap constraints; keyphrase benchmark datasets; keyphrases identification; mining process; semantic relationships; sequential dataset; sequential pattern mining; sequential patterns extraction; statistical pattern features; supervised learning classifier; wildcard constraints; Data mining; Databases; Educational institutions; Feature extraction; Microprogramming; Semantics; Time complexity; classification; keyphrase extraction; sequential pattern mining; wildcards;
         
        
        
        
            Conference_Titel : 
Data Mining (ICDM), 2014 IEEE International Conference on
         
        
            Conference_Location : 
Shenzhen
         
        
        
            Print_ISBN : 
978-1-4799-4303-6
         
        
        
            DOI : 
10.1109/ICDM.2014.105