• DocumentCode
    3165740
  • Title

    A document classification approach by GA feature extraction based corner classification neural network

  • Author

    Zhang, Weifeng ; Xu, Baowen ; Cui, Zifeng

  • fYear
    2005
  • fDate
    23-25 Nov. 2005
  • Lastpage
    504
  • Abstract
    The CC4 neural network is a new type of corner classification training algorithm for three-layered feed forward neural networks. CC4 is now successfully used in meta search engine Anvish. When the documents are almost of the same size, CC4 neural network is an effective document classification algorithm. However, there is great difference in document sizes in general, and CC4 use the whole dictionary as the space of vector which leads to a lot of documents represented by sparse vectors. This paper brings forward feature extraction based neural network GA-CC4. The method of GA feature extraction extracts the feature items really representing the documents in the document set, which are constructed as the set of feature items. Based on the set of feature items and combining the document frequency, the document can be represented. By this method, the dimensions representing the documents can be reduced, which can solve the precise problem caused by the different document sizes, and it can also map the scalar features to the Boolean input of the neural network by binary coding, by which the quality of input data of neural network is improved
  • Keywords
    classification; document handling; feature extraction; feedforward neural nets; genetic algorithms; Boolean input; binary coding; corner classification neural network; corner classification training algorithm; document classification; document frequency; feature extraction based neural network; feedforward neural network; genetic algorithm; meta search engine Anvish; scalar feature; sparse vector; Artificial neural networks; Classification algorithms; Clustering algorithms; Computer science; Feature extraction; Feedforward neural networks; Machine learning algorithms; Metasearch; Neural networks; Search engines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cyberworlds, 2005. International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    0-7695-2378-1
  • Type

    conf

  • DOI
    10.1109/CW.2005.3
  • Filename
    1587586