• DocumentCode
    3278946
  • Title

    A Novel Conception Based Texts Classification Method

  • Author

    Rujiang, Bai ; Junhua, Liao

  • Author_Institution
    Shandong Univ. of Technol. Libr., Zibo, China
  • fYear
    2009
  • fDate
    7-9 March 2009
  • Firstpage
    30
  • Lastpage
    34
  • Abstract
    Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, current text classification systems are based on the ldquoBag of Wordsrdquo (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. Fortunately, DBpedia appeared recently which contains rich semantic information. In this paper, we proposed a method compiling DBpedia knowledge into document representation to improve text classification. It facilitates the integration of the rich knowledge of DBpedia into text documents, by resolving synonyms and introducing more general and associative concepts. To evaluate the performance of the proposed method, we have performed an empirical evaluation using SVM calssifier on several real data sets. The experimental results show that our proposed framework, which integrates hierarchical relations, synonym and associative relations with traditional text similarity measures based on the BOW model, does improve text classification performance significantly.
  • Keywords
    knowledge representation; support vector machines; text analysis; DBpedia; Internet; SVM calssifier; automatic document expansion; bag of words representation; conception based texts classification method; document representation; support vector machine; text representation; Electronic mail; Frequency; Knowledge management; Libraries; Ontologies; Performance evaluation; Support vector machine classification; Support vector machines; Text categorization; Wikipedia; DBpedia; SVM; Semantic-enriched Representation; Text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Science and Technology, 2009. AST '09. International e-Conference on
  • Conference_Location
    Dajeon
  • Print_ISBN
    978-0-7695-3672-9
  • Type

    conf

  • DOI
    10.1109/AST.2009.15
  • Filename
    5231733