• DocumentCode
    240706
  • Title

    Modelling on clustering algorithm based on iteration feature selection for micro-blog posts

  • Author

    Kai Gao ; Bao-quan Zhang

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Hebei Univ. of Sci. & Technol., Shijiazhuang, China
  • fYear
    2014
  • fDate
    3-5 Dec. 2014
  • Firstpage
    295
  • Lastpage
    299
  • Abstract
    With the coming of big data era, data mining and intelligent processing become more and more important, and modelling on novel intelligent processing is necessary. As micro-blog posts´ properties on short texts, together with their linguistic unreliable features and the incompleteness of lexical, it is necessary to analyze and cluster these similar posts together for the further data mining and recommendation. This paper takes advantage of the classical clustering algorithm of k-means, and then presents a novel modelling approach to partition the big data into the corresponding k groups. Furthermore, a text feature selection model based on 2-phase iteration is proposed. Based on this model, a micro-blog post clustering algorithm is present. The proposed algorithm takes use of the partition idea and avoids the influence of noise data. Experiment shows the feasible of the proposed approach, and some existing problems and further works are also presented in the end.
  • Keywords
    Big Data; Web sites; data mining; feature selection; recommender systems; social networking (online); big data; data mining; intelligent processing; iteration feature selection; microblog post clustering algorithm; text feature selection model; Clustering algorithms; Data mining; Data models; Feature extraction; Noise; Pragmatics; Vectors; Micro-blog; data mining; feature selection; text cluster;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Modelling, Identification & Control (ICMIC), 2014 Proceedings of the 6th International Conference on
  • Conference_Location
    Melbourne, VIC
  • Type

    conf

  • DOI
    10.1109/ICMIC.2014.7020768
  • Filename
    7020768