• DocumentCode
    834022
  • Title

    Metric learning for text documents

  • Author

    Lebanon, Guy

  • Author_Institution
    Dept. of Stat., Purdue Univ., West Lafayette, IN, USA
  • Volume
    28
  • Issue
    4
  • fYear
    2006
  • fDate
    4/1/2006 12:00:00 AM
  • Firstpage
    497
  • Lastpage
    508
  • Abstract
    Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations. When applied to text document classification the resulting geodesic distance resemble, but outperform, the tfidf cosine similarity measure.
  • Keywords
    Lie groups; differential geometry; learning (artificial intelligence); statistical analysis; text analysis; transforms; Fisher information; Lie group; Riemannian metric; Riemannian volume element; differentiable manifold; geodesic distance; inverse volume maximization; machine learning; maximum likelihood; metric learning; multinomial simplex; pull-back metrics; text documents; tfidf cosine similarity measure; Euclidean distance; Geometry; Joining processes; Kernel; Level measurement; Machine learning; Machine learning algorithms; Neural networks; Probability; Text analysis; Distance learning; machine learning.; text analysis; Algorithms; Artificial Intelligence; Automatic Data Processing; Computer Graphics; Documentation; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Signal Processing, Computer-Assisted; User-Computer Interface;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2006.77
  • Filename
    1597108