Title :
Extraction of latent concepts from an integrated human gene database: Non-negative matrix factorization for identification of hidden data structure
Author :
Katsuhiko Murakami
Author_Institution :
School of Bioscience and Biotechnology, Tokyo University of Technology, Tokyo, Japan
Abstract :
Information in genetic databases often describes complex concepts, such as diseases and gene functions having implicit relationships. However, such information is presented as independent concepts (for example, “genes” and “function”), making it difficult for the user, even specialists, to understand their meaning in relation to one another. This facilitates the need for extraction of hidden relationships among biological concepts, and for the addition of this information to databases. Therefore, we factorized a gene data matrix and extracted hidden relationships among both genes and their functional terms. We successfully identified composite concepts explained by plural genes and plural terms. This re-organization provides new insights for researchers and is helpful for interpretation of information.
Keywords :
"Databases","Gene expression","Proteins","Matrix decomposition","Data mining","DNA","Cost function"
Conference_Titel :
Soft Computing and Pattern Recognition (SoCPaR), 2015 7th International Conference of
DOI :
10.1109/SOCPAR.2015.7492771