• DocumentCode
    559641
  • Title

    Text mining: Finding right documents from large collection of unstructured documents

  • Author

    Amarakoon, Savidu ; Caldera, Amitha

  • Author_Institution
    Sch. of Comput., Univ. of Colombo., Colombo, Sri Lanka
  • fYear
    2011
  • fDate
    24-26 Oct. 2011
  • Firstpage
    5
  • Lastpage
    10
  • Abstract
    In our day to day life we come across unstructured data in many forms. These include books journals, audio / video files and unstructured text such as emails, web pages and documents. And these data can be a vital source in order to make informed decisions. For example in any company there is a set of people who can be identified as the paramount from among its workforce. Identifying what is common among them and identifying others like them would undoubtedly improve the output of the company. This is the basis on which this research was carried out. The central aspect of the research was to use text mining techniques to mine the data in a set of documents and identify what are the common characteristics among them and then to identify other documents which contains these characteristics.
  • Keywords
    data mining; text analysis; data mining; right document finding; text mining techniques; unstructured document large collection; Indexing; Java; Libraries; Portable document format; Text mining; Data Mining; Document-based Searching; Lucene; Text Mining; Unstructured Data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining and Intelligent Information Technology Applications (ICMiA), 2011 3rd International Conference on
  • Conference_Location
    Macao
  • Print_ISBN
    978-1-4673-0231-9
  • Type

    conf

  • Filename
    6108390