Title : 
Automatic Keyword Extraction Using Linguistic Features
         
        
            Author : 
Hu, Xinghua ; Wu, Bin
         
        
            Author_Institution : 
Baskin Sch. of Eng., California Univ., Santa Cruz, CA
         
        
        
        
        
        
            Abstract : 
This paper describes a novel keyword extraction algorithm position weight (PW) that utilizes linguistic features to represent the importance of the word position in a document. Topical terms and their previous-term and next-term co-occurrence collections are extracted. To measure the degree of correlation between a topical term and its co-occurrence terms, three methods are employed including term frequency inverse term frequency (TFITF), position weight inverse position weight (PWIPW), and CHI-square (chi2). The co-occurrence terms that have the highest degree of correlation and exceed a co-occurrence frequency threshold are combined together with the original topical term to form a final keyword. With the linear computational complexity of the algorithm, the vector space of documents in a large corpus or boundless Web can be quickly represented by sets of keywords, which makes it possible to retrieve large-scale information fast and effectively
         
        
            Keywords : 
computational complexity; data mining; document handling; information retrieval; CHI-square; automatic keyword extraction; boundless Web; cooccurrence collections; cooccurrence frequency threshold; cooccurrence terms; large corpus; linear computational complexity; linguistic features; position weight inverse position weight; term frequency inverse term frequency; topical terms; vector space; word position; Computational complexity; Content based retrieval; Data mining; Feature extraction; Frequency measurement; Information retrieval; Large-scale systems; Position measurement; Vectors; World Wide Web;
         
        
        
        
            Conference_Titel : 
Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
         
        
            Conference_Location : 
Hong Kong
         
        
            Print_ISBN : 
0-7695-2702-7
         
        
        
            DOI : 
10.1109/ICDMW.2006.36