DocumentCode :
3469018
Title :
Creating categories for Wikipedia articles using Self-Organizing Maps
Author :
Szymanski, Janusz
Author_Institution :
Dept. of Electron., Telecommun. & Inf., Gdansk Univ. of Technol., Gdańsk, Poland
fYear :
2011
fDate :
3-5 March 2011
Firstpage :
1
Lastpage :
5
Abstract :
The article presents the results of the experiments performed on selected sub-set of Wikipedia which we categorized automaticly. We analyze two methods of text representation: based on references and word content. Using them we introduced joint representation that has been used to build groups of similar articles based on Kohonen Self-Organizing Maps. To fulfill efficiency of the data processing, we performed dimensionality reduction of raw data using Principal Component Analysis performed on similarity matrix. Changing the granularity of SOM network allows to build hierarchical categories and find significant relations between articles in documents repository.
Keywords :
Web sites; content management; principal component analysis; self-organising feature maps; text analysis; Kohonen self organizing maps; Wikipedia articles; category creation; dimensionality reduction; documents repository; principal component analysis; text representation; Color; Computers; Electronic publishing; Ethics; Information services; Internet; Principal Component Analysis; Self Organizing Maps; documents clustering; text processing; text processing text representation; text representation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications, Computing and Control Applications (CCCA), 2011 International Conference on
Conference_Location :
Hammamet
Print_ISBN :
978-1-4244-9795-9
Type :
conf
DOI :
10.1109/CCCA.2011.6031483
Filename :
6031483
Link To Document :
بازگشت