DocumentCode :
1873671
Title :
A comparative study of web pages classification methods applied to health consumer web pages
Author :
Siddiqui, Aneeta ; Adnan, Mehnaz ; Siddiqui, Rizwan Alam ; Mubeen, Tauseef
Author_Institution :
Sir Syed Univ. of Eng. & Technol., Karachi, Pakistan
fYear :
2015
fDate :
21-23 April 2015
Firstpage :
43
Lastpage :
48
Abstract :
These days, the Internet is developing at an exponential rate and can cover just about any data required. Nonetheless, the immense measure of web pages makes it more difficult to effectively discover the target data by a user. Therefore, an efficient method, for classifying this huge amount of data is essential if the web pages are to be exploited to its full potential. In the domain of automatic web page classifier many approaches have been tried to solve this problem using different Machine learning-based algorithms including Support Vector Machine (SVM), Naïve Bayes, Decision Tree, K-Nearest Neighbor (K-NN) and Neural Networks. However, there is a lack of comparison between these algorithms to find a better framework for the classification and analysis of health related web pages. In this research study, we compare two commonly used supervised Machine Learning algorithms; Support Vector Machines (SVM) and Naïve Bayes to classify web pages which provide drugs related information of patients for example side effects, patient action and follow-up information for patients. We use Unified Medical Language System (UMLS) to annotate the health related concepts in Web pages and train SVM and Naïve Bayes classifiers in General Architecture for Text Engineering to classify health related and non-health related Web pages. The evaluation was performed using K-fold cross validation using four runs on a data set of fifty Web pages. Results found that SVM performed better to classify health and non-health related pages in terms of precision, recall and F-measure.
Keywords :
Internet; decision trees; health care; learning (artificial intelligence); medical computing; neural nets; pattern classification; support vector machines; text analysis; Internet; K-NN; SVM; UMLS; Unified Medical Language System; Web page classification method; decision tree; health consumer Web page; k-nearest neighbor; machine learning-based algorithm; naïve Bayes; neural network; support vector machine; text engineering; Machine learning algorithms; Reliability; Support vector machines; Testing; Training; Unified modeling language; Web pages; Machine Learning; Unified Medical Language System(UMLS); Web Mining; Web Page Classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing Technology and Information Management (ICCTIM), 2015 Second International Conference on
Conference_Location :
Johor
Print_ISBN :
978-1-4799-6210-5
Type :
conf
DOI :
10.1109/ICCTIM.2015.7224591
Filename :
7224591
Link To Document :
بازگشت