DocumentCode :
2668269
Title :
Real-time unsupervised classification of web documents
Author :
Sigogne, Anthony ; Constant, Matthieu
Author_Institution :
Lab. d´´Inf. Gaspard-Monge, Univ. Paris-Est, Marne-la-Vallee, France
fYear :
2009
fDate :
12-14 Oct. 2009
Firstpage :
281
Lastpage :
286
Abstract :
This paper addresses the problem of clustering dynamic collections of web documents. We show an iterative algorithm based on a fine-grained keyword extraction (simple, compound words and proper nouns). Each new document inserted in the collection is either assigned to an existing class containing documents of the same topic, or assigned to a new class. After each step, when necessary, classes are refined using statistical techniques. The implementation of this algorithm was successfully integrated in an application used for Information Intelligence.
Keywords :
Internet; algorithm theory; document handling; pattern classification; real-time systems; Web documents; algorithm implementation; class containing documents; dynamic collections web documents; fine grained keyword extraction; information Intelligence; iterative algorithm based; real time classification; statistical techniques; Classification algorithms; Clustering algorithms; Computer science; Data mining; Frequency; Information technology; Iterative algorithms; Large-scale systems; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Information Technology, 2009. IMCSIT '09. International Multiconference on
Conference_Location :
Mragowo
Print_ISBN :
978-1-4244-5314-6
Type :
conf
DOI :
10.1109/IMCSIT.2009.5352714
Filename :
5352714
Link To Document :
بازگشت