• DocumentCode
    479786
  • Title

    Research on Popular Words and Phrases Extraction of Network Base on PAT TREE

  • Author

    Wu, Baozhen ; He, Tingting ; Zhang, Yong ; Li, Li ; Chen, Long

  • Author_Institution
    Dept. of Comput. Sci., Hua Zhong Normal Univ. Wuhan, Wuhan
  • Volume
    1
  • fYear
    2008
  • fDate
    12-14 Dec. 2008
  • Firstpage
    723
  • Lastpage
    726
  • Abstract
    This paper aims to mine popular words and phrases from internet by specific algorithm. We download web pages from Jan 1st 2007 to Jun 30th 2007 from different information sources of the network. We filter the set of the candidate words by three times in turn based on full segmentation with Pat-Tree. The first is the weight filter based on the vector space model, then used by the model of language regulation, the last through the filtration of rubbish cluster. Finally, we extract the popular words and phrases from the set of candidate words by the popular words determinant formula. At the same time we draw the tendcy curves of the popular words. The experimentation indicates that without reducing the correct rate of catchwords, the speed of computer-aided the popular words and phrases of network impoved distinctly.
  • Keywords
    Internet; data mining; natural languages; trees (mathematics); word processing; Internet; Pat-Tree; data mining; language regulation; natural languages; phrases extraction; vector space model; weight filter; Computer networks; Computer science; Helium; IP networks; Information filtering; Information filters; Natural languages; Software algorithms; Software engineering; Statistics; Chinese information processing; PAT TREE; Popular curves; Popular words and phrases of network;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering, 2008 International Conference on
  • Conference_Location
    Wuhan, Hubei
  • Print_ISBN
    978-0-7695-3336-0
  • Type

    conf

  • DOI
    10.1109/CSSE.2008.1210
  • Filename
    4721851