STATISTICAL POWER AND SAMPLE SIZE
Study Planning | Adequate Sampling | Effect Detection
Power and Sample Size Relationships
Critical Must-Knows
- Power: Probability of detecting a true effect (1 minus Beta). Conventional target is 80 percent.
- Sample Size Calculation Requires: Effect size (MCID), Alpha (usually 0.05), Power (usually 0.80), Variability (SD)
- Effect Size: The magnitude of difference you want to detect - must be clinically meaningful (MCID), not just statistically significant
- Underpowered Studies: Risk Type II error (false negative) - failing to detect real treatment effect
- Factors Increasing Sample Size: Smaller effect size, lower power, higher variability, lower alpha
Examiner's Pearls
- "Power = 80% means 20% chance of Type II error (missing a true effect)
- "MCID (Minimal Clinically Important Difference) defines what effect size matters to patients
- "Larger sample size increases power but also increases cost and time
- "Pilot studies help estimate variability (SD) for sample size calculations
Critical Power Concepts
What is Power?
Power = 1 minus Beta. Probability of correctly rejecting null hypothesis when alternative is true. Power = 80% means 80% chance of detecting real effect if it exists.
Sample Size Determinants
Four key inputs: (1) Alpha (Type I error, usually 0.05), (2) Power (1-Beta, usually 0.80), (3) Effect Size (MCID), (4) Variability (Standard Deviation).
Underpowered Studies
Risk: Type II error (false negative). Study may fail to detect real treatment benefit. Common in orthopaedic trials with small sample sizes.
Clinical vs Statistical Significance
Statistical Significance: p less than 0.05. Clinical Significance: Difference exceeds MCID. Large studies detect trivial differences; small studies miss important ones.
APESSample Size Calculation Inputs
Memory Hook:APES calculate sample size - Alpha, Power, Effect, SD are the four essentials!
SHAPEFactors that Increase Required Sample Size
Memory Hook:SHAPE your sample size - these five factors determine how many participants you need!
Overview/Introduction
What is Power?
Definition: Statistical power is the probability that a study will detect an effect when there truly is an effect to detect.
Formula: Power = 1 minus Beta (Type II error rate)
Interpretation:
- Power = 80%: 80% chance of detecting true effect, 20% chance of missing it (Type II error)
- Power = 50%: Coin flip - as likely to miss effect as to find it (underpowered)
- Power = 95%: 95% chance of detecting true effect, but requires much larger sample
Power Levels and Interpretation
| Power | Meaning | Adequacy | Sample Size |
|---|---|---|---|
| Greater than 90% | Very high chance of detecting true effect | Excellent but may be excessive | Very large sample needed |
| 80-90% | High chance of detecting true effect | Conventional and adequate | Moderate sample size |
| 50-80% | Moderate chance, meaningful risk of missing effect | Underpowered - risky | Smaller sample |
| Under 50% | More likely to miss effect than find it | Severely underpowered | Very small sample |
Understanding power is essential for designing adequately powered studies.
Principles of Power Analysis
Core Principles
Relationship Between Power and Sample Size:
- Larger sample size increases power
- Doubling sample size does NOT double power (diminishing returns)
- Power increases steeply initially, then plateaus
Trade-offs in Study Design:
- Higher power requires larger sample (more cost, time)
- Smaller effect size (more clinically conservative) requires larger sample
- Lower alpha (more statistically conservative) requires larger sample
Key Relationships:
- Power ∝ Sample Size
- Power ∝ Effect Size
- Power ∝ Alpha
- Power inversely proportional to Variability (SD)
Understanding these principles allows rational study design decisions.
Sample Size Calculation
Four Essential Inputs
Every sample size calculation requires four inputs:
Alpha: Type I Error Rate
Definition: Probability of falsely rejecting null hypothesis (false positive).
Conventional Choice: Alpha = 0.05 (5%)
Meaning: Willing to accept 5% chance of finding difference when none exists.
Trade-off: Lower alpha (e.g., 0.01) reduces false positives but requires larger sample size.
Bonferroni Correction: When testing multiple outcomes, divide alpha by number of tests to maintain overall Type I error rate.
Understanding alpha is critical for interpreting p-values and planning studies.
Performing Sample Size Calculation
Sample Size Formula (Continuous Outcome, Two Groups)
For comparing means between two groups:
n = 2 × (Zα + Zβ)² × SD² / MCID²
Where:
- n = sample size per group
- Zα = Z-score for alpha (1.96 for alpha = 0.05 two-tailed)
- Zβ = Z-score for beta (0.84 for power = 0.80)
- SD = standard deviation
- MCID = effect size (minimal clinically important difference)
Worked Example
Question: How many patients needed per group to detect 10-point improvement in WOMAC score?
Given:
- MCID = 10 points
- SD = 20 points (from literature)
- Alpha = 0.05 (Zα = 1.96)
- Power = 0.80 (Zβ = 0.84)
Calculation:
- n = 2 × (1.96 + 0.84)² × 20² / 10²
- n = 2 × 7.84 × 400 / 100
- n = 2 × 31.36
- n = 63 patients per group
Accounting for Dropout:
- If expecting 15% dropout: n = 63 / 0.85 = 74 patients per group
- Total enrollment: 148 patients
Understanding sample size calculation ensures adequately powered studies.
Types of Power Analysis
A Priori vs Post Hoc Power Analysis
Types of Power Analysis
| Type | When Performed | Purpose | Validity |
|---|---|---|---|
| A Priori (Prospective) | Before study begins | Calculate required sample size | Valid and recommended |
| Post Hoc (Retrospective) | After study completed | Calculate achieved power | Controversial - often misleading |
| Sensitivity Analysis | During planning | Assess power across range of assumptions | Useful for uncertainty |
A Priori Power Analysis (Recommended):
- Calculate sample size BEFORE enrolling patients
- Uses estimated effect size and SD from literature or pilot
- Ensures study designed with adequate power
Post Hoc Power Analysis (Problematic):
- Calculating power AFTER study complete using observed data
- Often done to explain non-significant results
- Mathematically redundant - p-value and post hoc power are directly related
Clinical Application
Underpowered Studies in Orthopaedics
Common Problem: Many orthopaedic RCTs are underpowered. Small sample sizes fail to detect clinically meaningful differences. Results are inconclusive, not negative.
MCID vs Statistical Significance
Clinical Relevance: A statistically significant finding (p less than 0.05) may not be clinically important. Always check if difference exceeds MCID.
Pilot Studies
Purpose: Estimate variability (SD) and feasibility before full trial. Helps refine sample size calculation. Do NOT use pilot for hypothesis testing.
Multicenter Trials
Solution: When single center cannot recruit adequate sample, multicenter collaboration achieves power. AOANJRR and international registries provide large samples.
Software and Calculation Tools
Common Power Analysis Software
Sample Size Calculation Tools
| Software | Cost | Features | Best For |
|---|---|---|---|
| G*Power | Free | Wide range of tests, user-friendly | Academic researchers, most designs |
| PS (Power and Sample Size) | Free | Simple interface, basic designs | Quick calculations, beginners |
| nQuery | Commercial | Comprehensive, regulatory accepted | Industry trials, complex designs |
| PASS | Commercial | Extensive documentation, FDA submissions | Regulatory submissions |
Online Calculators:
- ClinCalc sample size calculator (free online)
- OpenEpi power calculation (epidemiological studies)
- Sealed Envelope (clinical trial tools)
Manual Calculation Reference
| Statistic | Formula Component | Value (Common) |
|---|---|---|
| Zα (two-tailed, α=0.05) | Z-score for alpha | 1.96 |
| Zα (one-tailed, α=0.05) | Z-score for alpha | 1.645 |
| Zβ (power=0.80) | Z-score for beta | 0.84 |
| Zβ (power=0.90) | Z-score for beta | 1.28 |
Addressing Underpowered Studies
Strategies to Increase Power
Methods to Address Low Power
| Strategy | Approach | Advantages | Disadvantages |
|---|---|---|---|
| Increase sample size | Enroll more participants | Direct power increase | More cost, time, resources |
| Multicenter collaboration | Pool recruitment across sites | Achieves larger sample | Heterogeneity, logistics complexity |
| Reduce variability | Stricter inclusion criteria, standardized protocols | Increases precision | Reduces generalizability |
| Use more sensitive outcome | Choose outcome with lower SD | More precise measurement | May not be clinically preferred |
When Power Cannot Be Achieved
- Rare conditions: May need registry-based or multi-national studies
- Ethical constraints: Cannot enroll more for safety reasons
- Resource limitations: Accept lower power with pre-specified disclosure
Alternative Approaches:
- Bayesian analysis (can provide evidence even with small samples)
- Meta-analysis (combine with existing studies)
- Confidence interval interpretation (focus on precision)
Common Pitfalls and Errors
Errors in Sample Size Calculation
Common Pitfalls in Power Analysis
| Pitfall | Problem | Consequence | Solution |
|---|---|---|---|
| Underestimating dropout | Sample shrinks below powered size | Underpowered final analysis | Inflate by 15-25% for attrition |
| Unrealistic effect size | MCID too large or optimistic | Study underpowered for true effect | Use conservative, validated MCID |
| Wrong SD estimate | Variability higher than expected | Lower power than calculated | Use upper bound of SD estimate |
| Multiple comparisons ignored | Many outcomes without correction | Inflated Type I error | Adjust alpha (Bonferroni) or define primary outcome |
Interpretation Errors
Common Mistakes:
- Concluding treatments are "equivalent" from underpowered negative study
- Using post hoc power to justify non-significant results
- Ignoring confidence intervals when assessing clinical relevance
- Confusing statistical significance with clinical importance
Evidence Base
Power and Sample Size in Orthopaedic Trials
- Review of 215 RCTs in orthopaedic journals found 60% did not report sample size calculation
- Of studies reporting power, 40% were underpowered (power less than 0.80)
- Lack of power reporting makes it difficult to interpret negative results
- Recommendations: Always report sample size calculation and achieved power
MCID for Common Orthopaedic Outcome Measures
- Systematic review of MCID values for common outcome measures
- WOMAC: MCID approximately 10-15% of scale (10-15 points on 100-point scale)
- SF-36: MCID approximately 5 points for physical component
- VAS Pain: MCID approximately 15-20 mm on 100-mm scale
CONSORT Statement on Sample Size
- CONSORT Item 7a: How sample size was determined
- CONSORT Item 7b: When applicable, explanation of interim analyses and stopping guidelines
- Sample size justification should include effect size, power, alpha, and SD
- Facilitates assessment of study adequacy and interpretation of results
Exam Viva Scenarios
Practice these scenarios to excel in your viva examination
Scenario 1: Sample Size Calculation
"You are planning an RCT to compare two surgical approaches for rotator cuff repair. What information do you need to calculate the required sample size?"
Scenario 2: Interpreting Underpowered Study
"You read an RCT comparing two implants for THA. The study found no significant difference (p = 0.15) with 40 patients per group. The power calculation shows the study had 35 percent power. How do you interpret this result?"
MCQ Practice Points
Power Definition
Q: What is statistical power? A: The probability of detecting a true effect when it exists, calculated as 1 minus Beta (Type II error rate). Power = 80% means 80% chance of finding real difference if present, 20% risk of missing it.
Sample Size Determinants
Q: Which factor does NOT increase required sample size? A: Higher alpha (e.g., 0.10 vs 0.05) actually decreases required sample size. Factors that increase sample size: smaller effect size, higher power, higher variability (SD), lower alpha.
MCID Importance
Q: Why is MCID important for sample size calculation? A: MCID defines the clinically meaningful effect size - the smallest difference that matters to patients. Using MCID ensures study is powered to detect differences that are clinically relevant, not just statistically significant. Without MCID, large studies may detect trivial differences.
Management Algorithm

STATISTICAL POWER AND SAMPLE SIZE
High-Yield Exam Summary
Core Concepts
- •Power = 1 minus Beta = Probability of detecting true effect
- •Conventional power = 80% (20% risk of Type II error)
- •Sample size needs 4 inputs: Alpha, Power, Effect Size (MCID), SD
- •Underpowered study = High risk of missing real effect (Type II error)
Sample Size Calculation Inputs
- •Alpha = Type I error (usually 0.05) - false positive rate
- •Power = 1 minus Beta (usually 0.80) - true positive rate
- •Effect Size = MCID (clinically meaningful difference)
- •SD = Variability (from literature or pilot study)
- •Inflate by 15-20% for anticipated dropout
Factors Increasing Sample Size
- •Smaller effect size (harder to detect)
- •Higher power (90% vs 80%)
- •Lower alpha (0.01 vs 0.05)
- •Higher variability (larger SD)
- •Expected dropout or loss to follow-up
Interpreting Power
- •Power greater than 90% = Excellent, may be excessive
- •Power 80-90% = Adequate and conventional
- •Power 50-80% = Underpowered, risky
- •Power under 50% = Severely underpowered, likely to fail
- •Negative result from underpowered study = Inconclusive
Clinical Application
- •MCID defines clinical relevance, not just statistical significance
- •Many orthopaedic RCTs are underpowered (power under 80%)
- •Pilot studies estimate SD and feasibility, NOT for hypothesis testing
- •Absence of evidence is NOT evidence of absence (underpowered studies)
- •Wide confidence intervals indicate insufficient precision