Author_Institution :
Software Eng. Inst., East China Normal Univ., Shanghai, China
Abstract :
This paper presents a stock futures prediction strategy by using a hybrid method to forecast the price trends of the futures which is essential for investment decisions. In order to deal with huge amounts of futures data, our strategy consists of two main parts: I. Raw Data Treatment and Features Extraction, and II. DT-SVM Hybrid Model Training. In this paper, we employ real-world transaction data of stock futures contracts for our study. The data are first stored in a distributed database. Afterwards, the data are distributed to a group of computing nodes to extract statistical features. Finally, a hybrid method combing DT (Decision Tree) and SVM (Support Vector Machine) algorithms is applied. The method can filter most noisy data with the DT algorithm in the first phase, and then using the SVM algorithm to process the big training data in the second phase. As prediction models are trained for each stock futures contract, it is necessary to employ high performance algorithms. Therefore, to deal with the processing of the big data, distributed algorithms are implemented in the form of MapReduce. The experimental results show that our strategy can outperform three popular methods including Bootstrap-SVM, Bootstrap-DT and BPNN. Specifically, our DT-SVM strategy can achieve an increase on the best average precision rate, best average recall rate and best average F-One rate among the other three methods by 5%, 19%, and 12% respectively.
Keywords :
Big Data; decision trees; distributed databases; feature extraction; investment; statistical analysis; stock markets; support vector machines; DT-SVM hybrid model training; MapReduce; average F-One rate; average precision rate; average recall rate; big data; decision tree; distributed algorithm; distributed database; feature extraction; investment decision; price trend; raw data treatment; real-world transaction data; statistical features; stock future prediction; support vector machine; Data models; Distributed databases; Feature extraction; Market research; Prediction algorithms; Predictive models; Support vector machines; Big Data; Decision Tree; MapReduce; Stock Futures Predictioin; Support Vector Machine;