Title :
VisGBT: Visually analyzing evolving datasets for adaptive learning
Author :
Chen, Keke ; Tian, Fengguang
Author_Institution :
Dept. of Comput. Sci. & Eng., Wright State Univ., Dayton, OH, USA
Abstract :
Many machine learning problems involve changes in both feature distribution and label distribution, such as domain adaptation and learning drifting concepts from data streams. Correctly detecting, identifying, and understanding the changes of data distributions can help us properly select data samples or algorithms for learning models. However, since the training datasets are often in high dimensionality and large size, it has been difficult to effectively analyze them. Furthermore, the joint distribution between features and labels makes the problem more difficult to handle. In this paper, we propose a visual analysis method (VisGBT) that combines the gradient-boosting-trees (GBT) modeling method, regression analysis, and multidimensional visualization to capture the mismatches between datasets and models. The GBT model consists of a series of trees with a predefined number of terminal (leaf) nodes per tree. These terminal nodes partition the high dimensional space with a few most informative features to minimize the label prediction error. VisGBT maps various kinds of detailed model information to the terminal node matrix (TNM) and visualizes it with an appropriate design. With this visual analysis method, we can easily find out the detailed differences between datasets with the help of a learned model. We will illustrate the use of various visual patterns and in particular show how this method can help us analyze domain similarity for domain adaptation.
Keywords :
data visualisation; gradient methods; learning (artificial intelligence); matrix algebra; regression analysis; trees (mathematics); VisGBT; adaptive machine learning; data streams; domain adaptation; evolving training datasets; feature distribution; gradient-boosting-trees modeling method; label distribution; label prediction error; learning drifting concepts; multidimensional visualization; regression analysis; terminal node matrix; tree nodes; visual analysis method; visual patterns; Computer science; Costs; Data analysis; Data engineering; Data visualization; Machine learning; Machine learning algorithms; Multidimensional systems; Regression analysis; Training data;
Conference_Titel :
Collaborative Computing: Networking, Applications and Worksharing, 2009. CollaborateCom 2009. 5th International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-963-9799-76-9
Electronic_ISBN :
978-963-9799-76-9
DOI :
10.4108/ICST.COLLABORATECOM2009.8281