Author_Institution :
Sch. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
Abstract :
On the web, there are numerous websites publishing web pages to cover the events occurring in society. The web events data satisfies the well-accepted attributes of big data: Volume, Velocity, Variety and Value. As a great value of web events data, website preferences can help the followers of web events, e.g. peoples or organizations, to select the proper websites to follow their interested aspects of web events. However, the big volume, fast evolution speed, multisource and unstructured data all together make the value of website preferences mining very challenging. In this paper, website preference is formally defined at first. Then, according to the hierarchical attribute of web events data, we propose a hierarchical network model to organize big data of a web event from different organizations, different areas and different nations at a given time stamp. With this hierarchical network structure in hand, two strategies are proposed to mine the value of websites preferences from web events data. The first straightforward strategy utilizes the communities of keyword level network and the mapping relations between websites and keywords to unveil the Value in them. By taking the whole hierarchical network structure into consideration, an iterative algorithm is proposed in second strategy to refine the keyword communities like the first strategy. At last, an evaluation criteria of website preferences is designed to compare the performances of two proposed strategies. Experimental results show the proper combination of horizontal relations (each level network) with vertical relations (mapping relations between three level networks) can extract more value from web events data and then improve the efficiency on website preferences mining.
Keywords :
Big Data; Internet; Web sites; data mining; iterative methods; Web events data; Web pages; Web site preferences; World Wide Web; big data environment; evaluation criteria; hierarchical network structure; iterative algorithm; mining Web sites preferences; multisource data; unstructured data; Blogs; Communities; Data handling; Data mining; Data storage systems; Information management; Organizations; big data; hierarchical network; web event; website preferences;