• DocumentCode
    2532311
  • Title

    Assigning Web News to Clusters

  • Author

    Bouras, Christos ; Tsogkas, Vassilis

  • Author_Institution
    Comput. Eng. & Inf. Dept., Univ. of Patras, Patras, Greece
  • fYear
    2010
  • fDate
    9-15 May 2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The Web is overcrowded with news articles, an overwhelming information source both with its amount and diversity. Assigning news articles to similar groups, on the other hand, provides a very powerful data mining and manipulation technique for topic discovery from text documents. In this paper, we are investigating the application of a great spectrum of clustering algorithms, as well as similarity measures, to news articles that originate from the Web and compare their efficiency for use in an online Web news service application. We also examine the effect of preprocessing on clustering. Our experimentation showed that k-means, despite its simplicity, accompanied with preliminary steps for data cleaning and normalizing, gives better aggregate results when it comes to efficiency.
  • Keywords
    Web services; data mining; information resources; pattern clustering; clustering algorithms; data cleaning; data mining; news articles; online Web news service application; topic discovery; Aggregates; Application software; Cleaning; Clustering algorithms; Data mining; Informatics; Information retrieval; Partitioning algorithms; Power engineering computing; Web and internet services; Document Clustering; Hierarchical Clustering; Web News Articles; k-means; k-means++;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Internet and Web Applications and Services (ICIW), 2010 Fifth International Conference on
  • Conference_Location
    Barcelona
  • Print_ISBN
    978-1-4244-6728-0
  • Type

    conf

  • DOI
    10.1109/ICIW.2010.8
  • Filename
    5476825