DocumentCode :
2403030
Title :
Effect of Latent Semantic Indexing for Clustering Clinical Documents
Author :
Han, Choonghyun ; Choi, Jinwook
Author_Institution :
Interdiscipl. Program of Bioeng., Seoul Nat. Univ., Seoul, South Korea
fYear :
2010
fDate :
18-20 Aug. 2010
Firstpage :
561
Lastpage :
566
Abstract :
The measurement of similarity between documents is usually influenced by sparseness of term-document matrix. Latent semantic indexing (LSI) is an alternative method to solve the problem, and the dimension reduction by LSI improves the performance of the measurement of the similarity. In this study, LSI is examined as a method to cluster clinical documents containing the same clinical problems or disorders. The similarity of clinical documents was measured effectively with LSI. LSI performed better on clinical documents which can be characterized with medical terms, various expressions for the same concepts, abbreviations and typos, than editorials. Our result showed that LSI is useful for the measurement of the similarity of the clinical documents examined in this study. And the correlation between co-occurrence of terms and similarity is also analyzed as an important aspect of LSI. Not only the co-occurring terms but unshared terms between documents were found as factors influencing the similarity.
Keywords :
document handling; indexing; medical administrative data processing; pattern clustering; sparse matrices; statistical analysis; clinical document clustering; co-occurrence; latent semantic indexing; term document matrix; Biomedical measurements; Correlation; Discharges; Editorials; Indexing; Large scale integration; Semantics; clinical document; co-occurrence; latent semantic indexing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Science (ICIS), 2010 IEEE/ACIS 9th International Conference on
Conference_Location :
Yamagata
Print_ISBN :
978-1-4244-8198-9
Type :
conf
DOI :
10.1109/ICIS.2010.138
Filename :
5591005
Link To Document :
بازگشت