Title :
Data structuring and effective retrieval in the mining of web sequential characteristic
Author_Institution :
Heilongjiang Univ., Harbin, China
Abstract :
The Web data mining based on sequential characteristics is a mining technology focusing on text data on Web pages and link structure and combing sequential characteristics on the basis of the mining of Web structure and Web contents. A huge number of data information is carried on Web, and it is increased at a geometric speed every day. As time goes by, the effectiveness of a great number of data is continuously reduced, and they even become completely useless. How to clean these useless data, find out hidden regular contents among a great number of data, and solve the quality problem of data application has become the research hotspot in the Web data mining technology at present. All the information objects on Web can be generally divided into two categories: Structured data and semi-structured data. Those that can be expressed in database structure are called structured data; those expressed in various forms with text as representative are called semi-structured data. The greatest feature of Web data is semi-structuring. Such kind of semi-structured data are relevant to time sequence, meanwhile, time effect of data is also related to time sequence. In the article, discussion is made about how to use the sequential characteristic in the course of Web data mining to carry out structural transfer of semi-structured data based on time effect of data, that is the structuring of Web data, and solve the problem about effectiveness in retrieval accordingly.
Keywords :
Internet; data mining; data structures; information retrieval; text analysis; Web contents; Web data mining; Web pages; Web sequential characteristics mining; Web structure mining; data information; data retrieval; data structure; database structure; link structure; semistructured data; text data mining; time sequence; Data mining; Data models; Data warehouses; Databases; Educational institutions; Feature extraction; Web pages; B-Tree; Data time effect; Sequential characteristic; semi-structuring Web data;
Conference_Titel :
Electronic and Mechanical Engineering and Information Technology (EMEIT), 2011 International Conference on
Conference_Location :
Harbin, Heilongjiang
Print_ISBN :
978-1-61284-087-1
DOI :
10.1109/EMEIT.2011.6023787