Zephyrus's Blog
Why We Don't Really Know What Statistical Significance Means: But NOW I DO!
Differences in the Fisher and Neyman-Pearson School of Statistical Thought
Fisher (and his cloud of p-value inference)
Knowledge is created via inductive inference
- p value is P of an event X | Null hypothesis of no effect or relationship is true.
- Smaller the p value, greater the evidence against the null hypothesis
- p value is a measure of inductive evidence against Ho
- Data dependent random variable (Exploratory)
Neyman-Pearson (and their vices)
Statistical Testing as a mechanism for making decisions and guiding behaviour
- Two Hypotheses H0 and Ha
- Decision between two distinct courses of action
- Type I error is the false rejection of Ho
- Type II error is the false acceptance of Ho
- Statistical testing is aimed at ERROR CONTROL
- Not concerned with gathering evidence
- Must be fixed before gathering data to control Type I errors
- P value plays no role in NP theory
See Royall RM 1997 Statistical evidence - A likelihood paradigm. New York, Chapman and Hall Chapter 5 for further discussion on this point
The Confusion
- P value is compared to Type I error rate for rejecting Ho over Ha if P < Type I error
- When used this way, the specific value of p is irrelevant and should not be reported
- Can only say whether or not the result fell in the rejection region and not where it fell (as might be induced through a precise p value).
- The exact value of p cannot be reported in an NP test because alpha is the probability of a set of outcomes that may fall anywhere in the tail area of the distribution under a null hypothesis. We cannot know ahead of time which of these particular outcomes will occur
- The tail area for the p value is known only after the outcome is observed (Its not a probability of a set of outcomes, its the specific result of one outcome?)
- If the alpha is fixed, the p values cannot be re-interpreted at different values (p<0.05, p<0.01 etc)
- "Level of significance" cannot be interpreted by the p value.
- The p value is NOT a data dependent adjustable Type I error rate
- If the researcher is concerned with error probabilities, the specific p value is irrelevant
- If the researcher is interested in the "measure of evidence" from a p value, there is no point in also reporting the error probabilities
Advice from this article
- If the focus of the study is on controlling errors (e.g., in quality control experiments), use the N-P approach. Make a serious
attempt to calculate the costs of committing Type I and II errors.
- If the focus of the study is evidential in nature (which will be most of the time), use p values. Indeed, use exact p values, such as p = .04, whenever possible.
- Do not report p = .04 as p < .05
- Furthermore, do not present p values at fixed levels such as p < .05, p < .01, p < .001, and so on. This makes them look like Type I error rates.
- Recall that the p value is a measure of evidence against the null hypothesis.
- Be aware that p values can greatly exaggerate this evidence against H0.
- Remember, also, that the p value is not a measure of support for the alternative hypothesis, HA.
- Do not mistake p values for Type I error rates.
- P values are data dependent measures, not fixed levels. Alphas are pre-selected levels, not data-dependent values.
- It is completely inadmissible to use true N-P α values in a roving manner.
- Do not use the p < α criterion of statistical significance
- Present other information, such as confidence intervals, alongside or instead of significance levels.
Alternatives to obsessive p-value testing
- Report effect sizes, sample statistics and their confidence intervals
- CI stress the importance of estimation over testing
- Scientific progress depends on arriving at credible estimates of the magnitudes of effect sizes.
- CI yields a range of estimates deemed likely for the population
- Width of the CI provide a measure of reliability or precision of the estimate
- CIs make it easier to determine if a finding has any substantive as opposed to statistical significance.
- CIs are in the same metric as the risk estimate and are easier to interpret within the context of the problem
- CIs hold the true error rate to the chosen level.
- A 95% CI that does not include the null value is equivalent to rejecting the hypothesis at the 0.05 level.
- The use of CIs allows for the possibility of unifying a seemingly fragmented literature
Standing on the shoulders of...
- Overlapping CIs indicate consistent results if Risk estimates are in the same direction
even if
- a) the CIs cross the null
- b) p values are insignificant
- Consider publication bias when looking at average risk estimates.
- Since "insignificant" results are seldom published, the average effect size is bloated
---------------------------
Hubbard, R. and Armstrong, J. S. (2006).
Why we don't really know what statistical significance means:
Implications for educators.
Journal of Marketing Education, 28(2):114-120.
Posted on 2009-09-12 03:22:24.