• DocumentCode
    249092
  • Title

    Syllable n-gram approach for identification and classification of genres in Telugu language

  • Author

    Kumari, K. Pranitha ; Reddy, A. Venugopal

  • Author_Institution
    Dept. of CSE, Osmania Univ., Hyderabad, India
  • fYear
    2014
  • fDate
    19-20 Aug. 2014
  • Firstpage
    125
  • Lastpage
    129
  • Abstract
    The use of internet in India is increasing day by day and availability of information in Indian languages on the web is also increasing. So there is a need to classify the web data to improve the search results. Research is going on topic-based text classification but, the genre (non-topical) based web page classification for Telugu web pages is so far not considered. This work attempts to identify the web genres in Telugu language. In this paper, three web genres were identified from the Telugu language web pages based on the social acceptance and communicative purpose i.e. discourse functionality. Syllable extraction algorithm to extract character n-gram features is proposed. The classification was performed using SVM, Naive Bayes and Random forest classifiers. The classification results obtained show that the proposed algorithm gave better performance in terms of F-measure and accuracy.
  • Keywords
    Bayes methods; Internet; feature extraction; natural language processing; pattern classification; random processes; support vector machines; text analysis; F-measure; India; Indian languages; Internet; Naive Bayes; SVM; Syllable extraction algorithm; Telugu language Web pages; Web data classification; character n-gram feature extraction; communicative purpose; discourse functionality; genre based Web page classification; nontopical based Web page classification; random forest classifiers; social acceptance; topic-based text classification; Accuracy; Classification algorithms; Educational institutions; Feature extraction; Frequency modulation; Support vector machines; Web pages; Telugu web genres; character n-gram features; genre classification; genre identification; syllable extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Networks & Soft Computing (ICNSC), 2014 First International Conference on
  • Conference_Location
    Guntur
  • Print_ISBN
    978-1-4799-3485-0
  • Type

    conf

  • DOI
    10.1109/CNSC.2014.6906646
  • Filename
    6906646