مرکز منطقه ای اطلاع رساني علوم و فناوري - Elimination of redundant information for Web data mining

DocumentCode :

3155381

Title :

Elimination of redundant information for Web data mining

Author :

Taib, Shakirah Mohd ; Yeom, Soon-Ja ; Kang, Byeong-Ho

Author_Institution :

Sch. of Comput., Tasmania Univ., Australia

Volume :

fYear :

2005

fDate :

4-6 April 2005

Firstpage :

200

Abstract :

These days, billions of Web pages are created with HTML or other markup languages. They only have a few uniform structures and contain various authoring styles compared to traditional text-based documents. However, users usually focus on a particular section of the page that presents the most relevant information to their interest. Therefore, Web documents classification needs to group and filter the pages based on their contents and relevant information. Many researches on Web mining report on mining Web structure and extracting information from Web contents. However, they have focused on detecting tables that convey specific data, not the tables that are used as a mechanism for structuring the layout of Web pages. Case modeling of tables can be constructed based on structure abstraction. Furthermore, Ripple Down Rules (RDR) is used to implement knowledge organization and construction, because it supports a simple rule maintenance based on case and local validation.

Keywords :

Internet; belief maintenance; classification; data mining; HTML; Web contents; Web data mining; Web document classification; Web page filtering; Web page grouping; Web structure mining; information extraction; knowledge construction; knowledge organization; redundant information elimination; ripple down rules; rule maintenance; structure abstraction; text-based documents; Content based retrieval; Data mining; HTML; Information filtering; Information filters; Information retrieval; Markup languages; Monitoring; Web mining; Web pages;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Technology: Coding and Computing, 2005. ITCC 2005. International Conference on

Print_ISBN :

0-7695-2315-3

Type :

conf

DOI :

10.1109/ITCC.2005.143

Filename :

1428462

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3155381