• DocumentCode
    673003
  • Title

    Short Text Classification Using Wikipedia Concept Based Document Representation

  • Author

    Xiang Wang ; Ruhua Chen ; Yan Jia ; Bin Zhou

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Defense Technol., Changsha, China
  • fYear
    2013
  • fDate
    16-17 Nov. 2013
  • Firstpage
    471
  • Lastpage
    474
  • Abstract
    Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional. In this paper, we represent short text with Wikipedia concepts for classification. Short document text is mapped to Wikipedia concepts and the concepts are then used to represent document for text categorization. Traditional methods for classification such as SVM can be used to perform text categorization on the Wikipedia concept document representation. Experimental evaluation on real Google search snippets shows that our approach outperforms the traditional BOW method and gives good performance. Although it´s not better than the state-of-the-art classifier (see e.g. Phan et al. WWW ´08), our method can be easily implemented with low cost.
  • Keywords
    Web sites; information retrieval; pattern classification; text analysis; Google search snippets; SVM; information retrieval systems; multidimensional text data; short document text data mapping; short text classification; sparse text data; text categorization; wikipedia concept document representation; Electronic publishing; Encyclopedias; Indexes; Internet; Support vector machines; Text categorization; Document Representation; Short Text Classification; Wikipedia;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology and Applications (ITA), 2013 International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4799-2876-7
  • Type

    conf

  • DOI
    10.1109/ITA.2013.114
  • Filename
    6710030