• DocumentCode
    1761863
  • Title

    An Unsupervised Feature Selection Framework for Social Media Data

  • Author

    Jiliang Tang ; Huan Liu

  • Author_Institution
    Dept. of Comput. Sci., Arizona State Univ., Tempe, AZ, USA
  • Volume
    26
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 1 2014
  • Firstpage
    2914
  • Lastpage
    2927
  • Abstract
    The explosive usage of social media produces massive amount of unlabeled and high-dimensional data. Feature selection has been proven to be effective in dealing with high-dimensional data for efficient learning and data mining. Unsupervised feature selection remains a challenging task due to the absence of label information based on which feature relevance is often assessed. The unique characteristics of social media data further complicate the already challenging problem of unsupervised feature selection, e.g., social media data is inherently linked, which makes invalid the independent and identically distributed assumption, bringing about new challenges to unsupervised feature selection algorithms. In this paper, we investigate a novel problem of feature selection for social media data in an unsupervised scenario. In particular, we analyze the differences between social media data and traditional attribute-value data, investigate how the relations extracted from linked data can be exploited to help select relevant features, and propose a novel unsupervised feature selection framework, LUFS, for linked social media data. We systematically design and conduct systemic experiments to evaluate the proposed framework on data sets from real-world social media Web sites. The empirical study demonstrates the effectiveness and potential of our proposed framework.
  • Keywords
    feature selection; social networking (online); unsupervised learning; LUFS; attribute-value data; empirical analysis; feature relevance; label information; linked social media data characteristics; real-world social media Web sites; relation extraction; unlabeled high-dimensional data; unsupervised feature selection framework; Correlation; Data mining; Facebook; Feature extraction; Media; Spectral analysis; Unsupervised feature selection; linked data; pseudo labels; social dimension regularization; social media;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2320728
  • Filename
    6807781