مرکز منطقه ای اطلاع رساني علوم و فناوري - Semi-Supervised Clustering Models for Clinical Risk Assessment

DocumentCode :

2771299

Title :

Semi-Supervised Clustering Models for Clinical Risk Assessment

Author :

Yongyang Ho ; Azuaje, Francisco ; McCullagh, Paul ; Harper, Roy

Author_Institution :

Sch. of Comput. & Math., Ulster Univ., Jordanstown

fYear :

2006

fDate :

16-18 Oct. 2006

Firstpage :

243

Lastpage :

250

Abstract :

Clustering methods aim to organize a collection of cases into groupings, such that cases within one cluster are more similar to each other than to those in other clusters. A small amount of background knowledge may also be used to guide the clustering process and aid in the interpretation of results. This type of knowledge-driven clustering is known as semi-supervised clustering. This knowledge may be represented by pairwise constraints, labelled cases or known data groupings. Pairwise constraints may be specified, for example, as `MustLink´ or `CannotLink´ associations between cases. This research proposes a semi-supervised clustering method that exploits pairwise constraints and similarity information extracted from constrained cases. This semi-supervised clustering algorithm was first evaluated on publicly-available biomedical datasets. It was then applied to a Type II diabetes dataset to assess coronary heart disease (CHD) complication. This dataset comprises laboratory and physiological information from diabetic patients at the Ulster Hospital (UH) in Northern Ireland. The following methods were compared: traditional k-means, constraint-based k-means with pairwise constraints (CK method) and similarity-driven constraint-based k-means (SCK method). Results showed that the predictive quality, i.e. detection of relevant partitions and significant clusters, on these datasets was improved with a small amount of supervision (i.e. pairwise constraints automatically generated from the predefined class labels). Furthermore, the results from the UH dataset suggest significant associations between clustering outcomes with CHD complication in Type II diabetes patients

Keywords :

cardiology; data mining; diseases; knowledge representation; learning (artificial intelligence); medical information systems; pattern clustering; cannotlink associations; clinical risk assessment; constraint-based k-means method; coronary heart disease complication; information extraction; knowledge representation; knowledge-driven clustering; known data groupings; labelled cases groupings; laboratory information; mustlink associations; pairwise constraints; physiological information; predictive quality; publicly-available biomedical datasets; semisupervised clustering model; similarity-driven constraint-based k-means method; traditional k-means method; type II diabetes dataset; Cardiac disease; Clustering algorithms; Clustering methods; Data mining; Databases; Diabetes; Hospitals; Mathematical model; Mathematics; Risk management;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

BioInformatics and BioEngineering, 2006. BIBE 2006. Sixth IEEE Symposium on

Conference_Location :

Arlington, VA

Print_ISBN :

0-7695-2727-2

Type :

conf

DOI :

10.1109/BIBE.2006.253341

Filename :

4019666

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2771299