• DocumentCode
    1013822
  • Title

    An optimization criterion for generalized discriminant analysis on undersampled problems

  • Author

    Ye, Jieping ; Janardan, Ravi ; Park, Cheong Hee ; Park, Haesun

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of Minnesota-Twin Cities, Minneapolis, MN, USA
  • Volume
    26
  • Issue
    8
  • fYear
    2004
  • Firstpage
    982
  • Lastpage
    994
  • Abstract
    An optimization criterion is presented for discriminant analysis. The criterion extends the optimization criteria of the classical Linear Discriminant Analysis (LDA) through the use of the pseudoinverse when the scatter matrices are singular. It is applicable regardless of the relative sizes of the data dimension and sample size, overcoming a limitation of classical LDA. The optimization problem can be solved analytically by applying the Generalized Singular Value Decomposition (GSVD) technique. The pseudoinverse has been suggested and used for undersampled problems in the past, where the data dimension exceeds the number of data points. The criterion proposed in this paper provides a theoretical justification for this procedure. An approximation algorithm for the GSVD-based approach is also presented. It reduces the computational complexity by finding subclusters of each cluster and uses their centroids to capture the structure of each cluster. This reduced problem yields much smaller matrices to which the GSVD can be applied efficiently. Experiments on text data, with up to 7,000 dimensions, show that the approximation algorithm produces results that are close to those produced by the exact algorithm.
  • Keywords
    approximation theory; computational complexity; generalisation (artificial intelligence); optimisation; pattern classification; pattern clustering; singular value decomposition; approximation algorithm; computational complexity; generalized discriminant analysis; generalized singular value decomposition; linear discriminant analysis; optimization; pseudoinverse; scatter matrices; undersampled problems; Approximation algorithms; Clustering algorithms; Computational complexity; Data mining; Face recognition; Information retrieval; Linear discriminant analysis; Matrix decomposition; Scattering; Singular value decomposition; Classification; clustering; dimension reduction; generalized singular value decomposition; linear discriminant analysis; text mining.; Algorithms; Artificial Intelligence; Cluster Analysis; Discriminant Analysis; Documentation; Information Storage and Retrieval; Natural Language Processing; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Reproducibility of Results; Sample Size; Sensitivity and Specificity;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2004.37
  • Filename
    1307006