DocumentCode
2770699
Title
Information Extraction Using Web Usage Mining, Web Scrapping and Semantic Annotation
Author
Malik, Sanjay Kumar ; Rizvi, Sam
Author_Institution
Univ. Sch. of Inf. Technol., GGS Indraprastha Univ., New Delhi, India
fYear
2011
fDate
7-9 Oct. 2011
Firstpage
465
Lastpage
469
Abstract
Extracting useful information from the web is the most significant issue of concern for the realization of semantic web. This may be achieved by several ways among which Web Usage Mining, Web Scrapping and Semantic Annotation plays an important role. Web mining enables to find out the relevant results from the web and is used to extract meaningful information from the discovery patterns kept back in the servers. Web usage mining is a type of web mining which mines the information of access routes/manners of users visiting the web sites. Web scraping, another technique, is a process of extracting useful information from HTML pages which may be implemented using a scripting language known as Prolog Server Pages(PSP) based on Prolog. Third, Semantic annotation is a technique which makes it possible to add semantics and a formal structure to unstructured textual documents, an important aspect in semantic information extraction which may be performed by a tool known as KIM(Knowledge Information Management). In this paper, we revisit, explore and discuss some information extraction techniques on web like web usage mining, web scrapping and semantic annotation for a better or efficient information extraction on the web illustrated with examples.
Keywords
PROLOG; Web sites; authoring languages; data mining; hypermedia markup languages; information retrieval; semantic Web; text analysis; HTML pages; Prolog Server Pages; Web scrapping; Web sites; Web usage mining; access routes; scripting language; semantic Web; semantic annotation; semantic information extraction; unstructured textual documents; Browsers; HTML; Semantics; Web mining; Web servers; KIM; Prolog; Prolog Server Pages; Semantic Web; Text Grepping; Web Log Analyzer; Web Mining; Web Scrapping; Web Usage Mining; knowledge management; semantic annotation;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Communication Networks (CICN), 2011 International Conference on
Conference_Location
Gwalior
Print_ISBN
978-1-4577-2033-8
Type
conf
DOI
10.1109/CICN.2011.97
Filename
6112910
Link To Document