• DocumentCode
    177908
  • Title

    Average Overlap for Clustering Incomplete Data Using Symmetric Non-negative Matrix Factorization

  • Author

    Chaudhari, S. ; Murty, M.N.

  • Author_Institution
    Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
  • fYear
    2014
  • fDate
    24-28 Aug. 2014
  • Firstpage
    1431
  • Lastpage
    1436
  • Abstract
    Clustering techniques which can handle incomplete data have become increasingly important due to varied applications in marketing research, medical diagnosis and survey data analysis. Existing techniques cope up with missing values either by using data modification/imputation or by partial distance computation, often unreliable depending on the number of features available. In this paper, we propose a novel approach for clustering data with missing values, which performs the task by Symmetric Non-Negative Matrix Factorization (SNMF) of a complete pair-wise similarity matrix, computed from the given incomplete data. To accomplish this, we define a novel similarity measure based on Average Overlap similarity metric which can effectively handle missing values without modification of data. Further, the similarity measure is more reliable than partial distances and inherently possesses the properties required to perform SNMF. The experimental evaluation on real world datasets demonstrates that the proposed approach is efficient, scalable and shows significantly better performance compared to the existing techniques.
  • Keywords
    matrix decomposition; pattern clustering; SNMF; average overlap similarity metric; clustering techniques; complete pair-wise similarity matrix; partial distance computation; symmetric nonnegative matrix factorization; Accuracy; Clustering algorithms; Matrix converters; Matrix decomposition; Measurement; Reliability; Symmetric matrices;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2014 22nd International Conference on
  • Conference_Location
    Stockholm
  • ISSN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2014.255
  • Filename
    6976965