• DocumentCode
    3318079
  • Title

    Automatic content based title extraction for Chinese documents using support vector machine

  • Author

    Zhang, Zhengcao ; Sun, Maosong ; Liu, Shaoming

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    553
  • Lastpage
    558
  • Abstract
    In this paper, a content-based and domain-independent method for automatically extracting titles from Chinese research papers is proposed. The information contained in the title itself and the similarity between the title and the body of the paper is exploited, under the condition that the experiment is carried out on plain texts in which no any format information such as font is used. A list of words only used in Chinese titles and a list of words never used in Chinese titles are further collected to facilitate the title extraction. We use the support vector machine classifier to perform a robust and more adaptable automatic title extraction. The method achieves good performance on a test set consisting of 2438 research papers which cover almost all of the academic disciplines.
  • Keywords
    content-based retrieval; document handling; natural languages; support vector machines; Chinese document; automatic content based title extraction; support vector machine; Computer science; Data mining; Intelligent systems; Internet; Laboratories; Machine intelligence; Robustness; Sun; Support vector machine classification; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598799
  • Filename
    1598799