Title : 
Automatic textual document categorization based on generalized instance sets and a metamodel
         
        
            Author : 
Lam, Wai ; Han, Yiqiu
         
        
            Author_Institution : 
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Shatin, China
         
        
        
        
        
            fDate : 
5/1/2003 12:00:00 AM
         
        
        
        
            Abstract : 
We propose a new approach to text categorization known as generalized instance set (GIS) algorithm under the framework of generalized instance patterns. Our GIS algorithm unifies the strengths of k-NN and linear classifiers and adapts to characteristics of text categorization problems. It focuses on refining the original instances and constructs a set of generalized instances. We also propose a metamodel framework based on category feature characteristics. It has a metalearning phase which discovers a relationship between category feature characteristics and each component algorithm. Extensive experiments have been conducted on two large-scale document corpora for both GIS and the metamodel. The results demonstrate that both approaches generally achieve promising text categorization performance.
         
        
            Keywords : 
classification; document handling; learning by example; automatic textual document categorization; category feature characteristics; experiments; generalized instance patterns; generalized instance sets; instance-based learning; k-NN; k-nearest-neighbor; linear classifiers; metalearning; metamodel; text classification; Filtering; Geographic Information Systems; Humans; Large-scale systems; Machine learning; Management training; Pattern recognition; Routing; Systems engineering and theory; Text categorization;
         
        
        
            Journal_Title : 
Pattern Analysis and Machine Intelligence, IEEE Transactions on
         
        
        
        
        
            DOI : 
10.1109/TPAMI.2003.1195997