Title :
Web site classification based on URL and content: Algerian vs. non-Algerian case
Author :
Abdessamed, Ouessai ; Zakaria, Elberrichi
Author_Institution :
EEDIS Laboratory, Faculty of Technology, Department of Computer Science University Djillali Liabes Sidi Bel-Abbès Algeria
Abstract :
Web page classification based on topic or sentiments is a common application of web content mining techniques. In this paper we will present a novel application intended to identify the nation targeted by a specific web page. The aim is to be able to automatically distinguish websites targeting a specific nation, using both the URL and the content of a web page. In this paper we will address the issue of identifying Algerian-interest web pages using a machine learning approach. We will present the process of acquiring data for the supervised learning phase and adapting it into a usable dataset, as well as using it to construct three distinct classifiers using different parts of the data. The resulting classifiers have shown outstanding performances (up to F-score = 0.93) for such application.
Keywords :
Classification algorithms; Crawlers; Data mining; Labeling; Uniform resource locators; Web pages; classification; content-based; naïve bayes; target nation; url-based; web content mining;
Conference_Titel :
Programming and Systems (ISPS), 2015 12th International Symposium on
Conference_Location :
Algiers, Algeria
DOI :
10.1109/ISPS.2015.7244974