• DocumentCode
    2284326
  • Title

    A New Method of Training Sample Selection in Text Classification

  • Author

    Liao, Yixing ; Pan, Xuezeng

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
  • Volume
    1
  • fYear
    2010
  • fDate
    6-7 March 2010
  • Firstpage
    211
  • Lastpage
    214
  • Abstract
    Aiming to noise samples in the training dataset, a new method for reducing the amount of training dataset is proposed in the paper which is applicable to text classification. This method describes the distribution of training dataset according to the representativeness score of samples in the class they belong to, so as to show representative samples and noise samples in each class. The new method is applied on Chinese text dataset provided by Fudan Database Center. The experiments show that the proposed method can reduce noise samples effectively, improve the performance of classification and decrease the computational cost.
  • Keywords
    classification; natural language processing; text analysis; noise samples reduction; text classification; training dataset distribution; training sample selection; Computational efficiency; Computer science; Educational technology; Frequency; Iterative methods; Mutual information; Noise reduction; Paper technology; Probability; Text categorization; representativeness score; text classification; training dataset selection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Education Technology and Computer Science (ETCS), 2010 Second International Workshop on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-6388-6
  • Electronic_ISBN
    978-1-4244-6389-3
  • Type

    conf

  • DOI
    10.1109/ETCS.2010.621
  • Filename
    5458972