• DocumentCode
    397938
  • Title

    Machine quantification of text-based economic reports for use in predictive modeling

  • Author

    Gao, Lu ; Beling, Peter A.

  • Author_Institution
    Dept. of Syst. & Inf. Eng., Virginia Univ., Charlottesville, VA, USA
  • Volume
    4
  • fYear
    2003
  • fDate
    5-8 Oct. 2003
  • Firstpage
    3536
  • Abstract
    To quantify text-based unstructured information, we propose a method called the direct scoring algorithm (DSA). DSA uses keywords in the document, subjectively-determined numerical weights, and subjectively-designed grammar rules to score individual sentences. We use our methods to score the Beige books produced by the U.S. Federal Reserve, which contain subjective text-based commentary on state of the economy. To assess whether our scores have value in a predictive sense, we use them to construct a linear regression model of future growth in U.S. gross domestic product (GDP). We then compare the performance characteristics of this model with those a similar model based on scores of the same documents produced though subjective reading by professional economists. The comparison demonstrates that the DSA model using the Beige book significantly contributes to the prediction of GDP growth, explaining as much as 69% of the variance compared to the scores created by economic experts. We also add the extracted section scores to a GDP time series prediction model, which uses only structured data as input. The results of this experiment suggest the unstructured information in the Beige books has predictive value that goes beyond that of the structure information used in the time series model, and that our approach has some potential as a means of extracting this information in a semi-automated fashion.
  • Keywords
    data analysis; data mining; economic indicators; regression analysis; text analysis; Beige books; U.S. Federal Reserve; U.S. gross domestic product; direct scoring algorithm; linear regression model; machine quantification; predictive modeling; subjectively-designed grammar rules; subjectively-determined numerical weights; syntactical analysis; text mining; text-based economic reports; Books; Data mining; Economic forecasting; Economic indicators; Electric breakdown; Frequency; Linear regression; Predictive models; Systems engineering and theory; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2003. IEEE International Conference on
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-7952-7
  • Type

    conf

  • DOI
    10.1109/ICSMC.2003.1244437
  • Filename
    1244437