Title : 
Extracting Author Meta-Data from Web Using Visual Features
         
        
            Author : 
Zheng, Shuyi ; Zhou, Ding ; Li, Jia ; Giles, C. Lee
         
        
            Author_Institution : 
Pennsylvania State Univ., State College
         
        
        
        
        
        
            Abstract : 
Enriching digital library´s author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors´ information from their homepages. This problem is actually a multiclass classification problem. A homepage can be treated as a group of information pieces which need to be classified to different fields, e.g., Name, Title, Affiliation, Email, etc. In this problem, not only each information piece can be viewed as a point in a feature space, but also certain patterns can be observed among different fields on a page. To improve the extraction accuracy, this paper argues that visual features of information pieces on a homepage should be sufficiently utilized. In addition, this paper also proposes an inter-fields probability model to capture the relation among different fields. This model can be combined with feature- space based classification. Experimental results demonstrate that utilizing visual features and applying the inter- fields probability model can significantly improve the extraction accuracy.
         
        
            Keywords : 
Internet; Web sites; classification; digital libraries; feature extraction; knowledge acquisition; meta data; probability; World Wide Web; author information extraction; author meta-data extraction; digital library; feature-space based classification; homepages; interfields probability model; multiclass classification problem; visual features; Application software; Computer science; Conferences; Data engineering; Data mining; Kernel; Learning systems; Search engines; Software libraries; Statistics;
         
        
        
        
            Conference_Titel : 
Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
         
        
            Conference_Location : 
Omaha, NE
         
        
            Print_ISBN : 
978-0-7695-3019-2
         
        
            Electronic_ISBN : 
978-0-7695-3033-8
         
        
        
            DOI : 
10.1109/ICDMW.2007.59