• DocumentCode
    2740797
  • Title

    Autocorrection of Noise Text Based on Modularity Optimization

  • Author

    Xuan, Zhao-Guo ; Xia, Hao-Xiang ; Dang, Yan-Zhong ; Liu, Fang-Li

  • Author_Institution
    Dalian Univ. of Technol., Dalian
  • fYear
    2007
  • fDate
    5-7 Sept. 2007
  • Firstpage
    483
  • Lastpage
    483
  • Abstract
    This paper brings forward an autocorrection algorithm for noise texts based on modularity optimization. By noise texts we mean those documents in text corpus being distributed to a wrong category. Firstly, the document- similarity network is constructed, in which each node represents a document. If two nodes are similar in content, they are connected with a weighted edge, and their similarity is the weight. Secondly, the categories constitute the corresponding community structure in the network. Modularity has been introduced as a measure to evaluate the quality of community structures. In this paper modularity is used to evaluate the quality of categorise. Finally, noise texts are autocorrected by optimizing the modularity. The experimental results indicate that this algorithm can effectively revise the noise texts. This algorithm can also be used in the preprocessing of text classification or taxonomy building.
  • Keywords
    optimisation; text analysis; document-similarity network; modularity optimization; noise text autocorrection; text corpus documents; Buildings; Classification algorithms; Frequency; Noise reduction; Systems engineering and theory; Taxonomy; Testing; Text categorization; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovative Computing, Information and Control, 2007. ICICIC '07. Second International Conference on
  • Conference_Location
    Kumamoto
  • Print_ISBN
    0-7695-2882-1
  • Type

    conf

  • DOI
    10.1109/ICICIC.2007.189
  • Filename
    4428125