• DocumentCode
    2357684
  • Title

    A software infrastructure for research in textual data mining

  • Author

    Holzman, Lars E. ; Fisher, Todd A. ; Galitsky, Leon M. ; Kontostathis, April ; Pottenger, William M.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Lehigh Univ., USA
  • fYear
    2003
  • fDate
    3-5 Nov. 2003
  • Firstpage
    112
  • Lastpage
    121
  • Abstract
    Few tools exist that address the challenges facing researchers in the textual data mining (TDM) field. Some are too specific to their application, or are prototypes not suitable for general use. More general tools often are not capable of processing large volumes of data. We have created a textual data mining infrastructure (TMI) that incorporates both existing and new capabilities in a reusable framework conductive to developing new tools and components. TMI adheres to strict guidelines that allow it to run in a wide range of processing environments - as a result, it accommodates the volume of computing and diversity of research occurring in TDM. A unique capability of TMI is support for optimization. This facilitates text mining research by automating the search for optimal parameters in text mining algorithms. In this article we describe a number of applications that use the TMI. We present several novel results that have not been published elsewhere. We also discuss how the TMI utilizes existing machine-learning libraries, thereby enabling researchers to continue and extend their endeavors with minimal effort. Towards that end, TMI is available on the web at hddi.cse.lehigh.edu.
  • Keywords
    data mining; learning (artificial intelligence); software architecture; text analysis; World Wide Web; machine learning library; optimal parameter; optimization; research diversity; search automation; software infrastructure; textual data mining infrastructure; Application software; Computer science; Data engineering; Data mining; Design engineering; Libraries; Machine learning algorithms; Prototypes; Text mining; Time division multiplexing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2003. Proceedings. 15th IEEE International Conference on
  • ISSN
    1082-3409
  • Print_ISBN
    0-7695-2038-3
  • Type

    conf

  • DOI
    10.1109/TAI.2003.1250178
  • Filename
    1250178