DocumentCode :
3103309
Title :
Data Mining and Applied Linear Algebra
Author :
Chu, Moody
Author_Institution :
North Carolina State Univ., Raleigh
fYear :
2008
fDate :
17-17 Jan. 2008
Firstpage :
20
Lastpage :
25
Abstract :
In this era of hyper-technological innovation, massive amounts of data are being generated at almost every level of applications in almost every area of disciplines. Extracting interesting knowledge from raw data, or data mining in a broader sense, has become an indispensable task. Nevertheless, data collected from complex phenomena represent often the integrated result of several interrelated variables, whereas these variables are less precisely defined. The basic principle of data mining is to distinguish which variable is related to which and how the variables are related. In many situations, the digitized information is gathered and stored as a data matrix. It is often the case, or so assumed, that the exogenous variables depend on the endogenous variables in a linear relationship. Retrieving "useful" information therefore can often be characterized as finding "suitable" matrix factorization. This paper offers a synopsis from this prospect on how linear algebra techniques can help to carry out the task of data mining. Examples from factor analysis, cluster analysis, latent semantic indexing and link analysis are used to demonstrate how matrix factorization helps to uncover hidden connection and do things fast. Low rank matrix approximation plays a fundamental role in cleaning the data and compressing the data. Other types of constraints, such as nonnegativity, will also be briefly discussed.
Keywords :
approximation theory; data analysis; data compression; data mining; matrix decomposition; applied linear algebra; data analysis; data cleaning; data compression; data matrix factorization; data mining; information retrieval; knowledge extraction; low rank matrix approximation; Cleaning; Data analysis; Data mining; Image reconstruction; Indexing; Informatics; Information retrieval; Linear algebra; Mathematics; Technological innovation; cluster analysis; data mining; factor analysis; linear model; link analysis; matrix factorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Informatics Education and Research for Knowledge-Circulating Society, 2008. ICKS 2008. International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-0-7695-3128-1
Type :
conf
DOI :
10.1109/ICKS.2008.39
Filename :
4460463
Link To Document :
بازگشت