DocumentCode
3536167
Title
Preprocessing and Symbolic Representation of Stock Data
Author
Kumar, Mukesh ; Kalia, Arvind
Author_Institution
Dept. of Comput. Sci., Himachal Pradesh Univ., Shimla, India
fYear
2012
fDate
7-8 Jan. 2012
Firstpage
83
Lastpage
88
Abstract
There has been a lot of interest in mining the time series data. Stock data mining plays an important role to visualize the behavior of financial market. In financial data mining the data is normally represented in the numeric format, however, the symbolic representation is also used to evaluate the overall impact. Time series data are difficult to manipulate, but when they are treated as symbols instead of data points, interesting patterns can be discovered and it becomes an easier task to mine them. In this paper, a symbolic representation of NSE stock data of thirteen years period i.e. from Jan. 1996 to Dec.2008 is presented. The data preprocessing is an essential part of data mining Data cleaning fills in missing values, smoothes noisy data, handles or removes outliers, resolves inconsistencies. First of all the data was normalized, Normalization was done on the dataset using min-max normalization. The data transformation steps performed include offset translation, removing of linear trend, and removing of noise using moving average smoothing method. Further a best fitting line is used to remove the linear trend from the dataset. Euclidean distance measure has been used to establish relationships among various stocks. Three symbols [up, down, neutral] have been used for symbolic representation of the data and distance is evaluated as per the matching pattern of these symbols. It has been found that symbolic representation provides an easier interpretation and helped to determine an overall pattern. Symbolic pattern is having resemblance with price change pattern in numeric representation.
Keywords
data mining; data visualisation; minimax techniques; smoothing methods; stock markets; time series; NSE stock data; average smoothing method; behavior visualization; data cleaning; data transformation steps; financial market; linear trend removal; min-max normalization; noise removal; numeric representation; offset translation; price change pattern; stock data mining; symbolic representation; time series data mining; Data mining; Data preprocessing; Euclidean distance; Noise; Noise measurement; Smoothing methods; Time series analysis; Financial data mining; numeric data set; preprocessing; symbolic data set;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Computing & Communication Technologies (ACCT), 2012 Second International Conference on
Conference_Location
Rohtak, Haryana
Print_ISBN
978-1-4673-0471-9
Type
conf
DOI
10.1109/ACCT.2012.89
Filename
6168338
Link To Document