DocumentCode :
2338506
Title :
Extracting community structure features for hypertext classification
Author :
Zhang, Dell ; Mao, Robert
Author_Institution :
SCSIS, Univ. of London, London
fYear :
2008
fDate :
13-16 Nov. 2008
Firstpage :
436
Lastpage :
441
Abstract :
Standard text classification techniques assume that all documents are independent and identically distributed (i.i.d.). However, hypertext documents such as Web pages are interconnected with links. How to take advantage of such links as extra evidences to enhance automatic classification of hypertext documents is a non-trivial problem. We think a collection of interconnected hypertext documents can be considered as a complex network, and the underlying community structure of such a document network contains valuable clues about the right classification of documents. This paper introduces a new technique, modularity Eigenmap, that can effectively extract community structure features from the document network which is induced from document link information only or constructed by combining both document content and document link information. A number of experiments on real-world benchmark datasets show that the proposed approach leads to excellent classification performance in comparison with the state-of-the-art methods.
Keywords :
document handling; hypermedia; pattern classification; text analysis; Web pages; community structure feature extraction; complex network; document link information; hypertext classification; hypertext documents; modularity Eigenmap; Complex networks; Data mining; Feature extraction; Information analysis; Information retrieval; Large-scale systems; Semisupervised learning; Text categorization; Web pages; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Information Management, 2008. ICDIM 2008. Third International Conference on
Conference_Location :
London
Print_ISBN :
978-1-4244-2916-5
Electronic_ISBN :
978-1-4244-2917-2
Type :
conf
DOI :
10.1109/ICDIM.2008.4746816
Filename :
4746816
Link To Document :
بازگشت