DocumentCode :
2054097
Title :
A Method for Clustering E-business Contents
Author :
Ronghui, Liu ; Jianguo, Zheng ; Xiang, Wang
Author_Institution :
Sch. of Manage., Donghua Univ., Shanghai, China
Volume :
2
fYear :
2010
fDate :
14-15 Aug. 2010
Firstpage :
43
Lastpage :
46
Abstract :
With the rapid development of deep web, high quality data pre-processing and extraction are extremely essential from these web data sources. The clustering is a crucial step for the data processing. This paper presents a unified solution to tackle the issue of clustering e-business web contents. Firstly, the vocabulary are segmented based on the obtained web contents, and then perform statistically analysis on the segmentation results to tune the document frequency (DF) so that the dimensionality of feature vector representing the web contents is under control. Next, term frequency (TF) and inverse document frequency (IDF) are used to form a weighted vector matrix, which is utilized to cluster the obtained web contents. Experiments show that this approach is capable to cluster e-business web contents with reasonable recall rate and precision.
Keywords :
Internet; document handling; electronic commerce; matrix algebra; pattern clustering; statistical analysis; Web data sources; content clustering; deep Web; document frequency; e-business contents; inverse document frequency; statistical analysis; term frequency; weighted vector matrix; Clustering; Data extraction; Deep Web; TF. IDF; Words segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Engineering (ICIE), 2010 WASE International Conference on
Conference_Location :
Beidaihe, Hebei
Print_ISBN :
978-1-4244-7506-3
Electronic_ISBN :
978-1-4244-7507-0
Type :
conf
DOI :
10.1109/ICIE.2010.106
Filename :
5571224
Link To Document :
بازگشت