• DocumentCode
    3536167
  • Title

    Preprocessing and Symbolic Representation of Stock Data

  • Author

    Kumar, Mukesh ; Kalia, Arvind

  • Author_Institution
    Dept. of Comput. Sci., Himachal Pradesh Univ., Shimla, India
  • fYear
    2012
  • fDate
    7-8 Jan. 2012
  • Firstpage
    83
  • Lastpage
    88
  • Abstract
    There has been a lot of interest in mining the time series data. Stock data mining plays an important role to visualize the behavior of financial market. In financial data mining the data is normally represented in the numeric format, however, the symbolic representation is also used to evaluate the overall impact. Time series data are difficult to manipulate, but when they are treated as symbols instead of data points, interesting patterns can be discovered and it becomes an easier task to mine them. In this paper, a symbolic representation of NSE stock data of thirteen years period i.e. from Jan. 1996 to Dec.2008 is presented. The data preprocessing is an essential part of data mining Data cleaning fills in missing values, smoothes noisy data, handles or removes outliers, resolves inconsistencies. First of all the data was normalized, Normalization was done on the dataset using min-max normalization. The data transformation steps performed include offset translation, removing of linear trend, and removing of noise using moving average smoothing method. Further a best fitting line is used to remove the linear trend from the dataset. Euclidean distance measure has been used to establish relationships among various stocks. Three symbols [up, down, neutral] have been used for symbolic representation of the data and distance is evaluated as per the matching pattern of these symbols. It has been found that symbolic representation provides an easier interpretation and helped to determine an overall pattern. Symbolic pattern is having resemblance with price change pattern in numeric representation.
  • Keywords
    data mining; data visualisation; minimax techniques; smoothing methods; stock markets; time series; NSE stock data; average smoothing method; behavior visualization; data cleaning; data transformation steps; financial market; linear trend removal; min-max normalization; noise removal; numeric representation; offset translation; price change pattern; stock data mining; symbolic representation; time series data mining; Data mining; Data preprocessing; Euclidean distance; Noise; Noise measurement; Smoothing methods; Time series analysis; Financial data mining; numeric data set; preprocessing; symbolic data set;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computing & Communication Technologies (ACCT), 2012 Second International Conference on
  • Conference_Location
    Rohtak, Haryana
  • Print_ISBN
    978-1-4673-0471-9
  • Type

    conf

  • DOI
    10.1109/ACCT.2012.89
  • Filename
    6168338