• DocumentCode
    3144637
  • Title

    Towards exploratory hypothesis testing and analysis

  • Author

    Liu, Guimei ; Feng, Mengling ; Wang, Yue ; Wong, Limsoon ; Ng, See-Kiong ; Mah, Tzia Liang ; Lee, Edmund Jon Deoon

  • Author_Institution
    Dept. of Comput. Sci., Nat. Univ. of Singapore, Singapore, Singapore
  • fYear
    2011
  • fDate
    11-16 April 2011
  • Firstpage
    745
  • Lastpage
    756
  • Abstract
    Hypothesis testing is a well-established tool for scientific discovery. Conventional hypothesis testing is carried out in a hypothesis-driven manner. A scientist must first formulate a hypothesis based on his/her knowledge and experience, and then devise a variety of experiments to test it. Given the rapid growth of data, it has become virtually impossible for a person to manually inspect all the data to find all the interesting hypotheses for testing. In this paper, we propose and develop a data-driven system for automatic hypothesis testing and analysis. We define a hypothesis as a comparison between two or more sub-populations. We find sub-populations for comparison using frequent pattern mining techniques and then pair them up for statistical testing. We also generate additional information for further analysis of the hypotheses that are deemed significant. We conducted a set of experiments to show the efficiency of the proposed algorithms, and the usefulness of the generated hypotheses. The results show that our system can help users (1) identify significant hypotheses; (2) isolate the reasons behind significant hypotheses; and (3) find confounding factors that form Simpson´s Paradoxes with discovered significant hypotheses.
  • Keywords
    data mining; statistical testing; Simpson paradox; data-driven system; exploratory hypothesis analysis; exploratory hypothesis testing; frequent pattern mining techniques; scientific discovery; statistical testing; Data mining; Error analysis; Load modeling; Medical services; Probability; Space exploration;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2011 IEEE 27th International Conference on
  • Conference_Location
    Hannover
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4244-8959-6
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2011.5767907
  • Filename
    5767907