• DocumentCode
    3701521
  • Title

    Analysis of standard clustering algorithms for grouping MEDLINE abstracts into evidence-based medicine intervention categories

  • Author

    Vladimir Dobrynin;Yulia Balykina;Michael Kamalov

  • Author_Institution
    St. Petersburg State University, 7/9 Universitetskaya nab., 199034, Russia
  • fYear
    2015
  • Firstpage
    555
  • Lastpage
    557
  • Abstract
    The paper describes a process of clustering of article abstracts, taken from the largest bibliographic life sciences and biomedical information MEDLINE database into categories that correspond to types of medical interventions - types of patient treatments. Experiments were carried out to evaluate the quality of clustering for the following algorithms: K-means; K-means++; Hierarchical clustering, SIB (Sequential information bottleneck) together with the LSA (Latent Semantic Analysis) methods and MI (Mutual Information) which allow selecting feature vectors. Best results of clustering were achieved by K-means++ together with LSA then 210-dimensional space was chosen: Purity = 0.5719, Entropy = 1.3841, Normalized Entropy = 0.6299.
  • Keywords
    "Clustering algorithms","Entropy","Mutual information","Algorithm design and analysis","Libraries","Semantics","Information retrieval"
  • Publisher
    ieee
  • Conference_Titel
    "Stability and Control Processes" in Memory of V.I. Zubov (SCP), 2015 International Conference
  • Type

    conf

  • DOI
    10.1109/SCP.2015.7342223
  • Filename
    7342223