DocumentCode
2054097
Title
A Method for Clustering E-business Contents
Author
Ronghui, Liu ; Jianguo, Zheng ; Xiang, Wang
Author_Institution
Sch. of Manage., Donghua Univ., Shanghai, China
Volume
2
fYear
2010
fDate
14-15 Aug. 2010
Firstpage
43
Lastpage
46
Abstract
With the rapid development of deep web, high quality data pre-processing and extraction are extremely essential from these web data sources. The clustering is a crucial step for the data processing. This paper presents a unified solution to tackle the issue of clustering e-business web contents. Firstly, the vocabulary are segmented based on the obtained web contents, and then perform statistically analysis on the segmentation results to tune the document frequency (DF) so that the dimensionality of feature vector representing the web contents is under control. Next, term frequency (TF) and inverse document frequency (IDF) are used to form a weighted vector matrix, which is utilized to cluster the obtained web contents. Experiments show that this approach is capable to cluster e-business web contents with reasonable recall rate and precision.
Keywords
Internet; document handling; electronic commerce; matrix algebra; pattern clustering; statistical analysis; Web data sources; content clustering; deep Web; document frequency; e-business contents; inverse document frequency; statistical analysis; term frequency; weighted vector matrix; Clustering; Data extraction; Deep Web; TF. IDF; Words segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Engineering (ICIE), 2010 WASE International Conference on
Conference_Location
Beidaihe, Hebei
Print_ISBN
978-1-4244-7506-3
Electronic_ISBN
978-1-4244-7507-0
Type
conf
DOI
10.1109/ICIE.2010.106
Filename
5571224
Link To Document