Sample size calculations: should the emperor’s clothes be off the peg or made to measure?
An exampleImagine that you decide to do a study to see if control of primary hypertension is improved by home monitoring. You visit your local statistician for a sample size calculation, as the ethics board insists. The following are some key questions that he or she will ask and some tentative answers.What is the distribution of blood pressure in the population you intend to study?One study design might be to randomise people to treatment and control groups, put one group on monitors for a few months, then measure their blood pressure. We would then compare blood pressure in the two groups. To compute sample size, we need to know the standard deviation of systolic (or diastolic) blood pressure in the group we are studying. One recent meta-analysis of interventions to control hypertension gave values of 15-17 mm Hg.3How much do you think your treatment will affect systolic blood pressure?The most reasonable answer is, “How do I know? That’s why I’m doing the study.” Regrettably, you have to know to calculate sample size. Fortunately, a recent Cochrane review of 12 randomised trials with over 1200 patients per group provides a guide.3 The mean difference in systolic pressure was 2.53 mm Hg. Individual study results ranged from a drop of 26.0 mm Hg to a gain of 5.0 mm Hg. If we eliminate the two studies with very small samples of 9 and 18 and use the three largest observed differences, we end up with a mean drop of 6.9 mm Hg from studies with samples of 48, 55, and 76. Conversely, the studies with the three smallest treatment effects (n=123, 326, and 72) showed an average benefit of 1 mm Hg.What α and β levels do you want?That’s easy. Convention dictates that α (level of statistical significance) is 0.05 and β (the probability of a type II error: rejecting the null hypothesis when the alternate hypothesis is true) is 0.20. (However easy and universal these may be, the choice of a power of 0.80 is logically unsupportable, as shown by Bacchetti.2 But for the sake of convention, we will proceed.)We can now do the calculation (box). If we take the extremes, the smallest sample size, based on a reduction of 6.9 mm Hg and an SD of 15, equals 75 per group. The largest, for a 1 mm Hg drop and an SD of 17, equals 4624. The overall average drop of 2.53 corresponds to a sample size of 722. These estimates differ by a factor of 60 even though this was a “best case” situation, in which all studies had reasonable sample size and were viewed as sufficiently homogeneous to be included in a systematic review.Calculating sample sizeFor a difference between two groups, sample size=16×s2/d2where s is the standard deviation and d is the expected treatment effect. As Lehr has shown,4 for α=0.05 and power of 0.80 this is a close approximation to the exact formula. We have deliberately rounded the computed values to avoid the illusion of precision.Critics might argue that the choice of the three smallest and largest differences was arbitrary and extreme, but we used it to illustrate the point. We could have used alternative strategies, such as weighting by sample size. But the fact is that all the studies were derived from a Cochrane systematic review, all were examining a single question, all were deemed of sufficient quality to be included in the systematic review, and all were used in the final calculation in the review. On that basis, all are equal candidates for inclusion in a sample size calculation.In more representative situations where data are lacking, there would be even more “wiggle room.” Virtually all statisticians who have been engaged in this activity describe multiple iterations until the computed sample size converges to a desired result.