• DocumentCode
    685803
  • Title

    An improved K-means algorithm combined with Particle Swarm Optimization approach for efficient web document clustering

  • Author

    Jaganathan, P. ; Jaiganesh, S.

  • Author_Institution
    Dept. of Comput. Applic., PSNA Coll. of Eng. & Technol., Dindigul, India
  • fYear
    2013
  • fDate
    12-14 Dec. 2013
  • Firstpage
    772
  • Lastpage
    776
  • Abstract
    Searching and discovering the relevant information on the web have always been challenging task. It is very hard to wade through the large number of returned documents in a response to a user query. This leads to the need to organize a large set of documents into categories through clustering. There is a need of efficient clustering algorithms for organizing documents. Clustering on large dataset can be effectively done using partitional clustering algorithms. The K-means algorithm is the appropriate partitional clustering approach for handling large dataset because of its efficiency with respect to execution time. But this algorithm is highly susceptible to the selection of initial positions of cluster centers. This paper introduces a new hybrid method using Particle Swarm Optimization (PSO) combined with an improved K-means algorithm for document clustering. We have tested K-means, PSO, our proposed PSOK, KPSO and KPSOK algorithms on various text document collections. The document range varies from 204 to 878 in the dataset and the terms ranges from 5804 to 7454. There is clear evidence from our results that the proposed method achieves better clustering than other methods taken for study.
  • Keywords
    Internet; document handling; particle swarm optimisation; pattern clustering; KPSOK algorithms; PSO; Web document clustering; improved K-means algorithm; particle swarm optimization approach; partitional clustering approach; text document collections; Algorithm design and analysis; Clustering algorithms; Equations; Mathematical model; Particle swarm optimization; Partitioning algorithms; Vectors; Cluster Centroid; Euclidian distance; PSO; Vector Space Model; cosine correlation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Green Computing, Communication and Conservation of Energy (ICGCE), 2013 International Conference on
  • Conference_Location
    Chennai
  • Type

    conf

  • DOI
    10.1109/ICGCE.2013.6823538
  • Filename
    6823538