DocumentCode :
2853401
Title :
Data-intensive analysis of HIV mutations
Author :
Cintho, M. ; Marcondes Cesar Junior, Roberto ; Ferreira, Joao Eduardo
Author_Institution :
Dept. de Cienc. da Comput. - DCC, Univ. de Sao Paulo - USP, São Paulo, Brazil
fYear :
2012
fDate :
8-12 Oct. 2012
Firstpage :
1
Lastpage :
7
Abstract :
Mutations in HIV patients´ reverse transcriptase and protease may be related to drug resistance. There are many issues that make difficult the complete elucidation of the relationship between these mutations and drug resistance, such as cross resistance and the limitations to detect the relevance of resistance. Look up tables and rule-based systems are an attempt to classify sequences and predict treatment failure. However, they depend on the scientific literature and their quality and reliability. Data-intensive analysis of HIV mutation databases may help to corroborate or to improve such knowledge spread in the literature. Pattern recognition algorithms classify data extracting information from different data domain. Clustering and biclustering classification algorithms have been explored to group scientific and business data based on measures of similarities. K-means is a popular algorithm for clustering and Bimax is used with binary data. Considering this scenario, the main contribution of this work is to develop a new methodology based on K-means and Bimax using a binary data representation of reverse transcriptase and protease sequences, in an attempt to get an unsupervised classification of the sequences that may be related to drug resistance. In our work, 14,393 sequences with selected positions of the proteins, known to be related to drug resistance, represented in an 82-dimensional vector space are analyzed by pattern recognition algorithms. The sequences are represented as binary vectors. Suitable visualization of such vectors is produced for medical interpretation and indicates some correspondence to the prediction of drug resistance given by the brazilian look up table, used by brazilian physicians, but that depends on the literature on HIV and it´s quality to be created. As a consequence, in this work we describe a methodology based on the application of pattern recognition algorithms using binary data in order to suggest clusters of mutations and t- eir relations with drug resistance using a different cluster visualization scheme.
Keywords :
data analysis; data structures; data visualisation; database management systems; diseases; drugs; knowledge based systems; medical computing; molecular biophysics; pattern classification; pattern clustering; proteins; table lookup; 82-dimensional vector space; Bimax; Brazilian look up table; Brazilian physicians; HIV mutation databases; HIV patients; K-means clustering; biclustering classification algorithm; binary data representation; business data; cluster visualization scheme; clustering classification algorithm; cross resistance; data classification; data-intensive analysis; drug resistance; information extraction; medical interpretation; pattern recognition algorithms; protease; proteins; reverse transcriptase; rule-based systems; scientific data; treatment failure prediction; unsupervised sequence classification; Amino acids; Clustering algorithms; Drugs; Human immunodeficiency virus; Immune system; Proteins; Resistance; Clustering; HIV; biclustering; drug resistance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
E-Science (e-Science), 2012 IEEE 8th International Conference on
Conference_Location :
Chicago, IL
Print_ISBN :
978-1-4673-4467-8
Type :
conf
DOI :
10.1109/eScience.2012.6404411
Filename :
6404411
Link To Document :
بازگشت