• DocumentCode
    3478662
  • Title

    An improved TF-IDF weights function based on information theory

  • Author

    Wang, Na ; Wang, Pengyuan ; Zhang, Baowei

  • Author_Institution
    Dept. of Electron. & Commun., Zhengzhou Inst. of Aeronaut. Ind. Manage., Zhengzhou, China
  • Volume
    3
  • fYear
    2010
  • fDate
    12-13 June 2010
  • Firstpage
    439
  • Lastpage
    441
  • Abstract
    Vector Space Model (VSM) is a typical method to describe the text feature in text classification at present. It adopts TF-IDF weights to compute the term weighting in each dimension of the text feature. However, it only considers the relationship between the term and the whole text but neglects the relationship between different terms. Aiming at this problem an improved TF-IDF weights function is proposed which uses the distribution information among classes and inside a class. The experience shows that the improved method is feasible and effective. In addition, it greatly improves the accuracy of text category.
  • Keywords
    information theory; pattern classification; text analysis; TF-IDF weights function; information theory; inverse document frequency; term weighting; text classification; text frequency; vector space model; Biology; Function; Information Theory; TF-IDF Weights; Text Categorization; Vector Space Model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Communication Technologies in Agriculture Engineering (CCTAE), 2010 International Conference On
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4244-6944-4
  • Type

    conf

  • DOI
    10.1109/CCTAE.2010.5544382
  • Filename
    5544382