• DocumentCode
    3585510
  • Title

    Keywords Extraction from Chinese Document Based on Complex Network Theory

  • Author

    Jiangxia Nan ; Bo Xiao ; Zhiqing Lin ; Qianfang Xu

  • Author_Institution
    Inst. of Sensing Technol. & Bus., Beijing Univ. of Posts & Telecommun. Beijing, Beijing, China
  • Volume
    2
  • fYear
    2014
  • Firstpage
    383
  • Lastpage
    386
  • Abstract
    Keywords extraction is the process of choosing several words from a document to express its main idea. Keywords help people understand an article quickly and clearly. In recent years, more and more researchers pay attention to its research since its important role in text clustering, text classification, automatic abstracting, and text retrieval. This paper proposes an algorithm called EC-DC to extract keywords based on centrality measures of complex network. A document is mapped to a network with its words mapped to vertices and relations between words mapped to edges. Then, the importance of words is evaluated using eccentricity centrality and degree centrality. The most important K words are extracted as keywords. Experimental results show that the EC-DC algorithm has an improvement of about 9% in precision, recall and F-score compared to classical TFIDF algorithm.
  • Keywords
    complex networks; feature extraction; text analysis; Chinese document; EC-DC algorithm; automatic abstracting; complex network centrality measures; complex network theory; degree centrality; eccentricity centrality; keywords extraction; text classification; text clustering; text retrieval; Approximation algorithms; Business; Complex networks; Data mining; Feature extraction; Internet; Semantics; complex network; degree centrality; document network; eccentricity centrality; keywords extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence and Design (ISCID), 2014 Seventh International Symposium on
  • Print_ISBN
    978-1-4799-7004-9
  • Type

    conf

  • DOI
    10.1109/ISCID.2014.183
  • Filename
    7082012