DocumentCode
249092
Title
Syllable n-gram approach for identification and classification of genres in Telugu language
Author
Kumari, K. Pranitha ; Reddy, A. Venugopal
Author_Institution
Dept. of CSE, Osmania Univ., Hyderabad, India
fYear
2014
fDate
19-20 Aug. 2014
Firstpage
125
Lastpage
129
Abstract
The use of internet in India is increasing day by day and availability of information in Indian languages on the web is also increasing. So there is a need to classify the web data to improve the search results. Research is going on topic-based text classification but, the genre (non-topical) based web page classification for Telugu web pages is so far not considered. This work attempts to identify the web genres in Telugu language. In this paper, three web genres were identified from the Telugu language web pages based on the social acceptance and communicative purpose i.e. discourse functionality. Syllable extraction algorithm to extract character n-gram features is proposed. The classification was performed using SVM, Naive Bayes and Random forest classifiers. The classification results obtained show that the proposed algorithm gave better performance in terms of F-measure and accuracy.
Keywords
Bayes methods; Internet; feature extraction; natural language processing; pattern classification; random processes; support vector machines; text analysis; F-measure; India; Indian languages; Internet; Naive Bayes; SVM; Syllable extraction algorithm; Telugu language Web pages; Web data classification; character n-gram feature extraction; communicative purpose; discourse functionality; genre based Web page classification; nontopical based Web page classification; random forest classifiers; social acceptance; topic-based text classification; Accuracy; Classification algorithms; Educational institutions; Feature extraction; Frequency modulation; Support vector machines; Web pages; Telugu web genres; character n-gram features; genre classification; genre identification; syllable extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Networks & Soft Computing (ICNSC), 2014 First International Conference on
Conference_Location
Guntur
Print_ISBN
978-1-4799-3485-0
Type
conf
DOI
10.1109/CNSC.2014.6906646
Filename
6906646
Link To Document