• DocumentCode
    2054097
  • Title

    A Method for Clustering E-business Contents

  • Author

    Ronghui, Liu ; Jianguo, Zheng ; Xiang, Wang

  • Author_Institution
    Sch. of Manage., Donghua Univ., Shanghai, China
  • Volume
    2
  • fYear
    2010
  • fDate
    14-15 Aug. 2010
  • Firstpage
    43
  • Lastpage
    46
  • Abstract
    With the rapid development of deep web, high quality data pre-processing and extraction are extremely essential from these web data sources. The clustering is a crucial step for the data processing. This paper presents a unified solution to tackle the issue of clustering e-business web contents. Firstly, the vocabulary are segmented based on the obtained web contents, and then perform statistically analysis on the segmentation results to tune the document frequency (DF) so that the dimensionality of feature vector representing the web contents is under control. Next, term frequency (TF) and inverse document frequency (IDF) are used to form a weighted vector matrix, which is utilized to cluster the obtained web contents. Experiments show that this approach is capable to cluster e-business web contents with reasonable recall rate and precision.
  • Keywords
    Internet; document handling; electronic commerce; matrix algebra; pattern clustering; statistical analysis; Web data sources; content clustering; deep Web; document frequency; e-business contents; inverse document frequency; statistical analysis; term frequency; weighted vector matrix; Clustering; Data extraction; Deep Web; TF. IDF; Words segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Engineering (ICIE), 2010 WASE International Conference on
  • Conference_Location
    Beidaihe, Hebei
  • Print_ISBN
    978-1-4244-7506-3
  • Electronic_ISBN
    978-1-4244-7507-0
  • Type

    conf

  • DOI
    10.1109/ICIE.2010.106
  • Filename
    5571224