Title :
The research of theme identification in scientific documents
Author :
Chunlei, Ye ; Lu, Feng
Author_Institution :
Nat. Sci. of Libr., Beijing, China
Abstract :
There is abundant thematic information in the technical documentations which can reveal the content of the subject. Co-word analysis is an important method for Scientometrics analysis. And the theme clustering analysis based on co-word has become one of the most active research fields. Co-word clustering analysis forms a series of paper clusters which consists of scientific and technological documents. These theme clustering reflect the evolution of the development trend which contribute to grasp the development of science for researchers. So, It is necessary to identify the theme of these clusters. This paper analyses some typical approaches of theme identification in co-word analysis and their drawbacks, and advances an improved method that combines Latent Dirichlet Allocation model for theme identification. The experimental results prove that the advanced approach can utilize the merits of improved co-word analysis, especially in enhancing the thematic characteristic and coherency among the descriptors. And thus the advanced approach can be better used in theme identification of scientific documents.
Keywords :
document handling; pattern clustering; scientific information systems; coword clustering analysis; latent Dirichlet allocation model; scientific documents; scientometrics analysis; technical documentations; thematic information; theme clustering analysis; theme identification; Algorithm design and analysis; Cities and towns; Educational institutions; Indexes; Software; Software engineering; Latent Dirichlet Allocation; cluster analysis; co-word; theme identification;
Conference_Titel :
Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on
Conference_Location :
Zhangjiajie
Print_ISBN :
978-1-4673-0088-9
DOI :
10.1109/CSAE.2012.6273049