Title :
Different similarity measures in semi-supervised text classification
Author :
Wajeed, Mohammed Abdul ; Adilakshmi, T.
Author_Institution :
Sreenidhi Inst. of Sci. & Technol., Hyderabad, India
Abstract :
Information has a great value, in order to use the existing information we need to store it in a manner which can be retrieved easily when needed. So classifying the available information becomes inevitable. In addition to the existing supervised and unsupervised paradigms of classification the paper attempts to exploit the concept of semi-supervised learning paradigm. Semi-supervised learning is halfway between the supervised and unsupervised learning. In addition to unlabeled data, the algorithm is provided with some supervision information but not necessary for all example data. The paper explores the semi-supervised text classification which is applied to different types of vectors that are generated from the text documents. KNN algorithm is employed in the process of semi-supervised text classification, and results obtained are encouraging.
Keywords :
learning (artificial intelligence); text analysis; KNN algorithm; semisupervised learning; semisupervised text classification; similarity measure; unsupervised learning; Classification algorithms; Support vector machine classification; Text categorization; Time frequency analysis; Training; Training data; Vectors; confusion matrix; semi-supervised learning; similarity measures; text classification;
Conference_Titel :
India Conference (INDICON), 2011 Annual IEEE
Conference_Location :
Hyderabad
Print_ISBN :
978-1-4577-1110-7
DOI :
10.1109/INDCON.2011.6139401