• DocumentCode
    1638752
  • Title

    Automatic assamese text categorization using WordNet

  • Author

    Sarmah, J. ; Barman, A.K. ; Sarma, S.K.

  • Author_Institution
    Dept. of Inf. Technol., Gauhati Univ., Guwahati, India
  • fYear
    2013
  • Firstpage
    85
  • Lastpage
    89
  • Abstract
    The increasing rate of Assamese text contents in digital format encourages us to generate a system that automatically categorizes them. This paper discusses a system that will perform the categorization of texts automatically based on the knowledge from Assamese WordNet. In WordNet, synset correspond to the words which implies the same concept and words having more than one sense in a particular text content is disambiguated in this approach. This approach extracts words occurred in the document and uses them to create a synset vector with union to its corresponding synsets from WordNet. To increase our performance, we present a process where it increases the weight of not only the terms but also that of the synsets corresponding to the terms. We later count the occurrences of the senses that help in disambiguation tasks by propagating the relationship between synsets. The proposed method outcomes with a reasonable state of art accuracy when measured with Precision and Recall.
  • Keywords
    natural language processing; text analysis; word processing; Assamese WordNet; Assamese text contents; automatic Assamese text categorization; digital format; disambiguation tasks; precision; recall; synset vector; text content disambiguation approach; word extraction approach; Bismuth; Informatics; Assamese WordNet; Text Categorization; Word Sense Disambiguation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on
  • Conference_Location
    Mysore
  • Print_ISBN
    978-1-4799-2432-5
  • Type

    conf

  • DOI
    10.1109/ICACCI.2013.6637151
  • Filename
    6637151