DocumentCode
479786
Title
Research on Popular Words and Phrases Extraction of Network Base on PAT TREE
Author
Wu, Baozhen ; He, Tingting ; Zhang, Yong ; Li, Li ; Chen, Long
Author_Institution
Dept. of Comput. Sci., Hua Zhong Normal Univ. Wuhan, Wuhan
Volume
1
fYear
2008
fDate
12-14 Dec. 2008
Firstpage
723
Lastpage
726
Abstract
This paper aims to mine popular words and phrases from internet by specific algorithm. We download web pages from Jan 1st 2007 to Jun 30th 2007 from different information sources of the network. We filter the set of the candidate words by three times in turn based on full segmentation with Pat-Tree. The first is the weight filter based on the vector space model, then used by the model of language regulation, the last through the filtration of rubbish cluster. Finally, we extract the popular words and phrases from the set of candidate words by the popular words determinant formula. At the same time we draw the tendcy curves of the popular words. The experimentation indicates that without reducing the correct rate of catchwords, the speed of computer-aided the popular words and phrases of network impoved distinctly.
Keywords
Internet; data mining; natural languages; trees (mathematics); word processing; Internet; Pat-Tree; data mining; language regulation; natural languages; phrases extraction; vector space model; weight filter; Computer networks; Computer science; Helium; IP networks; Information filtering; Information filters; Natural languages; Software algorithms; Software engineering; Statistics; Chinese information processing; PAT TREE; Popular curves; Popular words and phrases of network;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Software Engineering, 2008 International Conference on
Conference_Location
Wuhan, Hubei
Print_ISBN
978-0-7695-3336-0
Type
conf
DOI
10.1109/CSSE.2008.1210
Filename
4721851
Link To Document