Title :
Syllable n-gram approach for identification and classification of genres in Telugu language
Author :
Kumari, K. Pranitha ; Reddy, A. Venugopal
Author_Institution :
Dept. of CSE, Osmania Univ., Hyderabad, India
Abstract :
The use of internet in India is increasing day by day and availability of information in Indian languages on the web is also increasing. So there is a need to classify the web data to improve the search results. Research is going on topic-based text classification but, the genre (non-topical) based web page classification for Telugu web pages is so far not considered. This work attempts to identify the web genres in Telugu language. In this paper, three web genres were identified from the Telugu language web pages based on the social acceptance and communicative purpose i.e. discourse functionality. Syllable extraction algorithm to extract character n-gram features is proposed. The classification was performed using SVM, Naive Bayes and Random forest classifiers. The classification results obtained show that the proposed algorithm gave better performance in terms of F-measure and accuracy.
Keywords :
Bayes methods; Internet; feature extraction; natural language processing; pattern classification; random processes; support vector machines; text analysis; F-measure; India; Indian languages; Internet; Naive Bayes; SVM; Syllable extraction algorithm; Telugu language Web pages; Web data classification; character n-gram feature extraction; communicative purpose; discourse functionality; genre based Web page classification; nontopical based Web page classification; random forest classifiers; social acceptance; topic-based text classification; Accuracy; Classification algorithms; Educational institutions; Feature extraction; Frequency modulation; Support vector machines; Web pages; Telugu web genres; character n-gram features; genre classification; genre identification; syllable extraction;
Conference_Titel :
Networks & Soft Computing (ICNSC), 2014 First International Conference on
Conference_Location :
Guntur
Print_ISBN :
978-1-4799-3485-0
DOI :
10.1109/CNSC.2014.6906646