DocumentCode :
2548397
Title :
An Algorithm for Classifying Articles and Patent Documents Using Link Structure
Author :
Indukuri, Kishore Varma ; Mirajkar, Pranav ; Sureka, Ashish
Author_Institution :
SET Labs., Infosys Technol. Ltd., Bangalore
fYear :
2008
fDate :
20-22 July 2008
Firstpage :
203
Lastpage :
210
Abstract :
Studying link structure of the World Wide Web (WWW) is an area which has attracted a lot of interest. Several papers have been published on structural analysis of hyperlinked environments such as the WWW. The WWW can be modeled as a graph and valuable information can be derived by analyzing links between the Web-pages primarily for the purpose of building better search engines. Many novel methods have been presented to discover communities from the WWW and discover authoritative Web-pages. Citation analysis is a branch of information science on which plenty of research has been done. Citation analysis pertains to analysis of articles and research paper citations in a scholarly field and deriving useful information from it. It has primarily been used as a useful tool to quantify and judge the impact of a paper or a journal. The work presented in this paper lies at the intersection of the two fields: structural analysis of WWW and citation analysis. In this paper, we present a method for classifying documents (such as articles and patents containing references) to a class or topic based on their link structure, references and citations. The method consists of analyzing the link structure of a corpus to first identify authoritative papers and assigning a class label to them. The class labels are assigned manually by a domain expert by going through the respective documents. The next step consists of identifying related papers to the authoritative papers using citation analysis. The authoritative papers, their class labels and their related papers constitute a model. Papers for which class label needs to be determined are classified based on the created model.
Keywords :
Internet; citation analysis; classification; document handling; graph theory; information science; search engines; World Wide Web; articles classification; authoritative Web pages; citation analysis; graph; hyperlinked environments; information science; link structure; patent document classification; search engines; structural analysis; Association rules; Citation analysis; Cities and towns; Data mining; Government; Information analysis; Information management; Portfolios; Predictive models; World Wide Web; Bibliography Coupling; Citation graph; Co-citation; Document Similarity; Link Topology; Text Mining; Web Community;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
Conference_Location :
Zhangjiajie Hunan
Print_ISBN :
978-0-7695-3185-4
Electronic_ISBN :
978-0-7695-3185-4
Type :
conf
DOI :
10.1109/WAIM.2008.31
Filename :
4597015
Link To Document :
بازگشت