• DocumentCode
    1806028
  • Title

    An Empirical Comparison of Four Text Mining Methods

  • Author

    Lee, Sangno ; Baker, Jeff ; Song, Jaeki ; Wetherbe, James C.

  • Author_Institution
    Texas Tech Univ., Lubbock, TX, USA
  • fYear
    2010
  • fDate
    5-8 Jan. 2010
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    The amount of textual data that is available for researchers and businesses to analyze is increasing at a dramatic rate. This reality has led IS researchers to investigate various text mining techniques. This essay examines four text mining methods that are frequently used in order to identify their advantages and limitations. The four methods that we examine are (1) latent semantic analysis, (2) probabilistic latent semantic analysis, (3) latent Dirichlet allocation, and (4) the correlated topic model. We compare these four methods and highlight the optimal conditions under which to apply the various methods. Our paper sheds light on the theory that underlies text mining methods and provides guidance for researchers who seek to apply these methods.
  • Keywords
    data mining; text analysis; IS researchers; correlated topic model; empirical comparison; latent Dirichlet allocation; latent semantic analysis; probabilistic latent semantic analysis; text mining methods; textual data; Advertising; Customer relationship management; Data analysis; Databases; Information systems; Management information systems; Natural languages; Object detection; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System Sciences (HICSS), 2010 43rd Hawaii International Conference on
  • Conference_Location
    Honolulu, HI
  • ISSN
    1530-1605
  • Print_ISBN
    978-1-4244-5509-6
  • Electronic_ISBN
    1530-1605
  • Type

    conf

  • DOI
    10.1109/HICSS.2010.48
  • Filename
    5428645