DocumentCode :
479786
Title :
Research on Popular Words and Phrases Extraction of Network Base on PAT TREE
Author :
Wu, Baozhen ; He, Tingting ; Zhang, Yong ; Li, Li ; Chen, Long
Author_Institution :
Dept. of Comput. Sci., Hua Zhong Normal Univ. Wuhan, Wuhan
Volume :
1
fYear :
2008
fDate :
12-14 Dec. 2008
Firstpage :
723
Lastpage :
726
Abstract :
This paper aims to mine popular words and phrases from internet by specific algorithm. We download web pages from Jan 1st 2007 to Jun 30th 2007 from different information sources of the network. We filter the set of the candidate words by three times in turn based on full segmentation with Pat-Tree. The first is the weight filter based on the vector space model, then used by the model of language regulation, the last through the filtration of rubbish cluster. Finally, we extract the popular words and phrases from the set of candidate words by the popular words determinant formula. At the same time we draw the tendcy curves of the popular words. The experimentation indicates that without reducing the correct rate of catchwords, the speed of computer-aided the popular words and phrases of network impoved distinctly.
Keywords :
Internet; data mining; natural languages; trees (mathematics); word processing; Internet; Pat-Tree; data mining; language regulation; natural languages; phrases extraction; vector space model; weight filter; Computer networks; Computer science; Helium; IP networks; Information filtering; Information filters; Natural languages; Software algorithms; Software engineering; Statistics; Chinese information processing; PAT TREE; Popular curves; Popular words and phrases of network;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Software Engineering, 2008 International Conference on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3336-0
Type :
conf
DOI :
10.1109/CSSE.2008.1210
Filename :
4721851
Link To Document :
بازگشت