• DocumentCode
    568071
  • Title

    Improving SVM on web content classification by document formulation

  • Author

    Xia, Tian ; Chai, Yanmei ; Wang, Tong

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Shanghai Second Polytech. Univ., Shanghai, China
  • fYear
    2012
  • fDate
    14-17 July 2012
  • Firstpage
    110
  • Lastpage
    113
  • Abstract
    Web contents are going overwhelming today. The numerous online documents, webpages, e-books, etc. are much useful but obtaining them is also time-consuming. Text categorization is one of the solutions to the issue. For all text categorization method, Support Vector Machines (SVM) is one of the most acceptable one. However, to perform more efficiently on webpages, it is necessary to add improvements on it. For webpages, the document title is meaningful as it is usually carefully created by editors and always shows the main content of the webpage. In this paper, an improvement of Support Vector Machine is proposed. The Document Representation for SVM emphases the features in documents´ title which is always popular in webpages and obviously contains essential contextual information for the documents.
  • Keywords
    Internet; pattern classification; support vector machines; text analysis; SVM; Web content classification; Webpages; contextual information; document formulation; document representation; document title; e-books; online documents; support vector machines; text categorization method; Educational institutions; Kernel; Support vector machine classification; Text categorization; Training; Vectors; Natural language Processing; SVM; Support Vector Machines; Text Classification; Title Vector;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science & Education (ICCSE), 2012 7th International Conference on
  • Conference_Location
    Melbourne, VIC
  • Print_ISBN
    978-1-4673-0241-8
  • Type

    conf

  • DOI
    10.1109/ICCSE.2012.6295037
  • Filename
    6295037