Author_Institution :
Dept. of Comput. Sci., Indiana Univ., Fort Wayne, IN
Abstract :
Text mining is growing as an essential method of knowledge discovery from general and business documents. Although, documents viz. press releases, emails, memos, contracts, government reports and news feeds, are considered to be unstructured, they are tapped for information using text analysis techniques like feature extraction, thematic indexing, clustering and summarization. For this project, 30 representative documents from a small enterprise were collected to determine the dominant features in their activities. Based on the analysis of the document profiles generated by extracting the frequencies of certain terms, clustering and filtering on the basis of both repetitive occurrence and co-occurrence, a coherent picture of the functional relationship among large and heterogeneous lists of terms were obtained. It affords investigators an extractive interface to complex text data. This paper shows how these documents were mined using text-based WordStat software as well as the potentials, features and options of the program
Keywords :
business data processing; data mining; small-to-medium enterprises; text analysis; WordStat; document mining; knowledge discovery; small enterprise; text analysis; Contracts; Data mining; Feature extraction; Feeds; Filtering; Frequency; Government; Indexing; Text analysis; Text mining; Clustering and WordStat.; Cooccurrence; Documents; Text Mining;