Zephyrus's Blog

Preview goes here.
Why We Don't Really Know What Statistical Significance Means: But NOW I DO!

Differences in the Fisher and Neyman-Pearson School of Statistical Thought

Fisher (and his cloud of p-value inference)

Knowledge is created via inductive inference

  • p value is P of an event X | Null hypothesis of no effect or relationship is true.
  • Smaller the p value, greater the evidence against the null hypothesis
  • p value is a measure of inductive evidence against Ho
  • Data dependent random variable (Exploratory)

Neyman-Pearson (and their vices)

Statistical Testing as a mechanism for making decisions and guiding behaviour

  • Two Hypotheses H0 and Ha
  • Decision between two distinct courses of action
  • Type I error is the false rejection of Ho
  • Type II error is the false acceptance of Ho
  • Statistical testing is aimed at ERROR CONTROL
  • Not concerned with gathering evidence
  • Must be fixed before gathering data to control Type I errors
  • P value plays no role in NP theory

See Royall RM 1997 Statistical evidence - A likelihood paradigm. New York, Chapman and Hall Chapter 5 for further discussion on this point



The Confusion

  • P value is compared to Type I error rate for rejecting Ho over Ha if P < Type I error
  • When used this way, the specific value of p is irrelevant and should not be reported
  • Can only say whether or not the result fell in the rejection region and not where it fell (as might be induced through a precise p value).
  • The exact value of p cannot be reported in an NP test because alpha is the probability of a set of outcomes that may fall anywhere in the tail area of the distribution under a null hypothesis. We cannot know ahead of time which of these particular outcomes will occur
  • The tail area for the p value is known only after the outcome is observed (Its not a probability of a set of outcomes, its the specific result of one outcome?)
  • If the alpha is fixed, the p values cannot be re-interpreted at different values (p<0.05, p<0.01 etc)
  • "Level of significance" cannot be interpreted by the p value.
  • The p value is NOT a data dependent adjustable Type I error rate
  • If the researcher is concerned with error probabilities, the specific p value is irrelevant
  • If the researcher is interested in the "measure of evidence" from a p value, there is no point in also reporting the error probabilities

Advice from this article

  • If the focus of the study is on controlling errors (e.g., in quality control experiments), use the N-P approach. Make a serious

attempt to calculate the costs of committing Type I and II errors.

  • If the focus of the study is evidential in nature (which will be most of the time), use p values. Indeed, use exact p values, such as p = .04, whenever possible.
  • Do not report p = .04 as p < .05
  • Furthermore, do not present p values at fixed levels such as p < .05, p < .01, p < .001, and so on. This makes them look like Type I error rates.
  • Recall that the p value is a measure of evidence against the null hypothesis.
  • Be aware that p values can greatly exaggerate this evidence against H0.
  • Remember, also, that the p value is not a measure of support for the alternative hypothesis, HA.
  • Do not mistake p values for Type I error rates.
  • P values are data dependent measures, not fixed levels. Alphas are pre-selected levels, not data-dependent values.
  • It is completely inadmissible to use true N-P α values in a roving manner.
  • Do not use the p < α criterion of statistical significance
  • Present other information, such as confidence intervals, alongside or instead of significance levels.

Alternatives to obsessive p-value testing

  • Report effect sizes, sample statistics and their confidence intervals
  • CI stress the importance of estimation over testing
  • Scientific progress depends on arriving at credible estimates of the magnitudes of effect sizes.
  • CI yields a range of estimates deemed likely for the population
  • Width of the CI provide a measure of reliability or precision of the estimate
  • CIs make it easier to determine if a finding has any substantive as opposed to statistical significance.
  • CIs are in the same metric as the risk estimate and are easier to interpret within the context of the problem
  • CIs hold the true error rate to the chosen level.
  • A 95% CI that does not include the null value is equivalent to rejecting the hypothesis at the 0.05 level.
  • The use of CIs allows for the possibility of unifying a seemingly fragmented literature

Standing on the shoulders of...

  • Overlapping CIs indicate consistent results if Risk estimates are in the same direction

even if

  1. a) the CIs cross the null
  2. b) p values are insignificant

  • Consider publication bias when looking at average risk estimates.
  1. Since "insignificant" results are seldom published, the average effect size is bloated

---------------------------

Hubbard, R. and Armstrong, J. S. (2006). Why we don't really know what statistical significance means: Implications for educators. Journal of Marketing Education, 28(2):114-120.


Posted on 2009-09-12 03:22:24.