• DocumentCode
    3279054
  • Title

    Asymptotic evaluation of distance measure on high dimensional vector spaces in text mining

  • Author

    Goto, Masayuki ; Ishida, Takashi ; Suzuki, Makoto ; Hirasawa, Shigeichi

  • Author_Institution
    Fac. of Environ. & Inf. Studies, Musashi Inst. of Technol., Yokohama
  • fYear
    2008
  • fDate
    7-10 Dec. 2008
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    This paper discusses the document classification problems in text mining from the viewpoint of asymptotic statistical analysis. In the problem of text mining, the several heuristics are applied to practical analysis because of its experimental effectiveness in many case studies. The theoretical explanation about the performance of text mining techniques is required and such thinking will give us very clear idea. In this paper, the performances of distance measures used to classify the documents are analyzed from the new viewpoint of asymptotic analysis. We also discuss the asymptotic performance of IDF measure used in the information retrieval field.
  • Keywords
    classification; data mining; information retrieval; statistical analysis; text analysis; asymptotic distance measure evaluation; asymptotic statistical analysis; document classification problem; high dimensional vector space; information retrieval; text mining; Electronic mail; Extraterrestrial measurements; Frequency measurement; Information retrieval; Information theory; Performance analysis; Space technology; Statistics; Text categorization; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Theory and Its Applications, 2008. ISITA 2008. International Symposium on
  • Conference_Location
    Auckland
  • Print_ISBN
    978-1-4244-2068-1
  • Electronic_ISBN
    978-1-4244-2069-8
  • Type

    conf

  • DOI
    10.1109/ISITA.2008.4895453
  • Filename
    4895453