• DocumentCode
    3778963
  • Title

    Automatic text categorization: Marathi documents

  • Author

    Jaydeep Jalindar Patil;Nagaraju Bogiri

  • Author_Institution
    Department of Computer Engineering (Computer Networks), K. J. College of Engineering & Management, Research Pune, India
  • fYear
    2015
  • Firstpage
    689
  • Lastpage
    694
  • Abstract
    Information technology generated huge data on the internet. Initially this data is mainly in English language so majority of data mining research work is on the English text documents. As the internet usage increased, data in other languages like Marathi, Tamil, Telugu and Punjabi etc. increased on the internet. This paper presents the retrieval system for Marathi language documents based on the user profile. User profile considers the user´s interests, user´s browsing history. The system shows the Marathi documents to the end user based on the user profile. Automatic text categorization is useful in better management and retrieval of these text documents and also makes document retrieval as simple task. This paper discusses the automatic text categorization of Marathi documents and literature survey of the related work done in automatic text categorization of Marathi documents. Various learning techniques exist for the classification of text documents like Naïve Bayes, Support Vector Machine and Decision Trees etc. There are different clustering techniques used for text categorization like Label Induction Grouping Algorithm, Suffix Tree Clustering, and K- means etc. Literature survey shows that for non-English documents VSM [Vector Space Model] gives the better results than any other models. The system provides text categorization of Marathi documents by using the LINGO [Label Induction Grouping] algorithm. LINGO is based on the VSM [Vector Space Model]. The system uses the dataset which contains 200 documents of 20 different categories. The result represents that for Marathi text documents LINGO clustering algorithm is efficient.
  • Keywords
    "Clustering algorithms","Matrix decomposition","Text categorization","Internet","Algorithm design and analysis","Classification algorithms","Search engines"
  • Publisher
    ieee
  • Conference_Titel
    Energy Systems and Applications, 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/ICESA.2015.7503438
  • Filename
    7503438