Skip to main content
OrthoVellum
Knowledge Hub

Study

  • Topics
  • MCQs
  • ISAWE
  • Operative Surgery
  • Flashcards

Company

  • About Us
  • Editorial Policy
  • Contact
  • FAQ
  • Blog

Legal

  • Terms of Service
  • Privacy Policy
  • Cookie Policy
  • Medical Disclaimer
  • Copyright & DMCA
  • Refund Policy

Support

  • Help Center
  • Accessibility
  • Report an Issue
OrthoVellum

© 2026 OrthoVellum. For educational purposes only.

Not affiliated with the Royal Australasian College of Surgeons.

Common Statistical Tests

Back to Topics
Contents
0%

Common Statistical Tests

Comprehensive guide to common statistical tests in orthopaedic research including t-tests, ANOVA, chi-square, and regression, with guidance on test selection.

complete
Updated: 2025-12-24
High Yield Overview

COMMON STATISTICAL TESTS

Test Selection | Parametric vs Non-Parametric | Interpretation

t-testCompare 2 Groups (Continuous)
ANOVACompare 3+ Groups (Continuous)
Chi-squareCompare Proportions (Categorical)
RegressionModel Relationships (Multiple Variables)

Test Selection by Data Type

Continuous Outcome, 2 Groups
PatternIndependent samples t-test or Mann-Whitney U
TreatmentParametric if normal, non-parametric if not
Continuous Outcome, 3+ Groups
PatternANOVA or Kruskal-Wallis
TreatmentParametric if normal, non-parametric if not
Categorical Outcome, 2+ Groups
PatternChi-square or Fisher exact
TreatmentChi-square if expected counts over 5
Association Between Variables
PatternCorrelation (Pearson or Spearman) or Regression
TreatmentDepends on data type and distribution

Critical Must-Knows

  • T-test: Compares means between 2 groups. Assumes normality, equal variance, independence.
  • ANOVA: Compares means across 3 or more groups. Post-hoc tests needed to identify which groups differ.
  • Chi-square: Tests association between categorical variables. Expected count should be over 5 in each cell.
  • Regression: Models relationship between outcome and predictor(s). Linear for continuous outcomes, logistic for binary.
  • Parametric vs Non-Parametric: Parametric assumes normal distribution (t-test, ANOVA). Non-parametric does not (Mann-Whitney, Kruskal-Wallis).

Examiner's Pearls

  • "
    Use paired t-test for before-after comparisons, independent t-test for separate groups
  • "
    ANOVA tells you IF groups differ, not WHICH groups - need post-hoc tests (Tukey, Bonferroni)
  • "
    Fisher exact test preferred over chi-square when expected counts under 5
  • "
    Correlation does NOT imply causation - confounders may explain association

Critical Test Selection Concepts

Data Type Determines Test

Continuous outcome: t-test, ANOVA, regression. Categorical outcome: Chi-square, Fisher exact, logistic regression. Always match test to data type.

Parametric Assumptions

Requirements: Normal distribution, equal variance, independence. Check normality: Histogram, Q-Q plot, Shapiro-Wilk test. If violated: Use non-parametric alternative.

Paired vs Independent

Paired: Same subjects measured twice (before-after). Use paired t-test. Independent: Different subjects in each group. Use independent t-test. Test choice depends on design.

Multiple Comparisons

Problem: Testing many groups inflates Type I error. Solution: Use ANOVA first (omnibus test), then post-hoc with correction (Tukey, Bonferroni). Do NOT run multiple t-tests.

At a Glance

Statistical test selection depends on data type and study design: t-test compares means between 2 groups (paired for before-after, independent for separate groups), ANOVA compares 3+ groups (requires post-hoc tests like Tukey/Bonferroni to identify which differ), chi-square tests associations between categorical variables (use Fisher exact when expected counts under 5), and regression models relationships between outcomes and predictors. Parametric tests (t-test, ANOVA) assume normal distribution—if violated, use non-parametric alternatives (Mann-Whitney U, Kruskal-Wallis). Key pitfall: running multiple t-tests inflates Type I error; use ANOVA first as an omnibus test. Correlation does not imply causation—confounders may explain observed associations.

Mnemonic

DINGOChoosing the Right Test

D
Data type
Continuous, categorical, or ordinal?
I
Independent or paired
Same subjects or different subjects?
N
Number of groups
2 groups or 3 or more groups?
G
Group distribution
Normal (parametric) or not (non-parametric)?
O
Outcome relationship
Simple comparison or modeling predictors?

Memory Hook:Follow the DINGO trail to find the right statistical test for your data!

Mnemonic

NINEParametric Test Assumptions

N
Normality
Data follow normal distribution (check with histogram, Q-Q plot)
I
Independence
Observations are independent (not clustered or repeated)
N
No outliers
Extreme values can violate assumptions
E
Equal variance
Homoscedasticity - variance similar across groups (Levene test)

Memory Hook:Check NINE assumptions before using parametric tests - or use non-parametric alternatives!

Overview and Introduction

Statistical tests are the foundation of evidence-based orthopaedics. Understanding when to use each test and how to interpret results is essential for critically appraising literature and conducting research. This topic covers the most common statistical tests used in orthopaedic research.

Concepts and Mechanisms

Fundamental Statistical Concepts

Hypothesis Testing Framework

  • Null hypothesis (H0): Assumes no difference or no effect between groups
  • Alternative hypothesis (H1): States there IS a difference or effect
  • Type I error (α): Rejecting H0 when it's true (false positive) - typically set at 0.05
  • Type II error (β): Failing to reject H0 when it's false (false negative)
  • Power (1-β): Probability of detecting a true effect - aim for over 80%

Central Limit Theorem As sample size increases, the sampling distribution of the mean approaches a normal distribution, regardless of the population distribution. This is why parametric tests work with large samples even when data is skewed.

Parametric vs Non-Parametric Tests

AspectParametricNon-Parametric
AssumptionsNormality, equal varianceNo distribution assumptions
PowerHigher when assumptions metLower but more robust
Examplet-testMann-Whitney U

Effect Size vs Statistical Significance

  • p-value: Probability of observing result if null hypothesis is true
  • Effect size: Magnitude of the difference (Cohen's d, odds ratio)
  • Clinical significance: Whether the effect matters clinically
  • A statistically significant result may not be clinically meaningful!

Tests for Continuous Outcomes

Comparing Two Groups

Independent Samples t-test

Use When:

  • Comparing means between 2 independent groups
  • Continuous outcome variable
  • Data approximately normally distributed
  • Equal variance between groups

Example: Comparing WOMAC scores between cemented vs uncemented THA groups.

Null Hypothesis: Mean outcome is equal in both groups.

Assumptions:

  • Normal distribution in each group
  • Independence of observations
  • Equal variance (homoscedasticity)

Interpretation: p less than 0.05 indicates significant difference in means.

Alternatives if Assumptions Violated:

  • Non-normal distribution: Mann-Whitney U test (non-parametric)
  • Unequal variance: Welch t-test (does not assume equal variance)

The independent t-test is the most common test in orthopaedic research.

Paired Samples t-test

Use When:

  • Comparing means for same subjects at 2 time points
  • Before-after design, crossover design
  • Continuous outcome
  • Differences approximately normally distributed

Example: Comparing pain scores before vs after surgery in same patients.

Null Hypothesis: Mean difference is zero.

Advantage: Controls for individual variation - more powerful than independent t-test.

Assumptions:

  • Normal distribution of differences (not raw values)
  • Independence of pairs
  • No order effect (for crossover designs)

Alternative if Violated: Wilcoxon signed-rank test (non-parametric paired test).

Paired tests increase power by accounting for within-subject correlation.

Comparing Three or More Groups

One-Way ANOVA

Use When:

  • Comparing means across 3 or more independent groups
  • Continuous outcome
  • Data approximately normally distributed
  • Equal variance across groups

Example: Comparing functional scores across 3 surgical approaches.

Null Hypothesis: All group means are equal.

Key Point: ANOVA tells you IF any groups differ, NOT which specific groups differ.

Post-Hoc Tests (if ANOVA significant):

  • Tukey HSD: Compares all pairwise combinations, controls family-wise error
  • Bonferroni: Conservative, divides alpha by number of comparisons
  • Dunnett: Compares all groups to control group only

Assumptions:

  • Normal distribution in each group
  • Independence of observations
  • Equal variance (homoscedasticity)

Alternative if Violated: Kruskal-Wallis test (non-parametric ANOVA).

Never run multiple independent t-tests instead of ANOVA - inflates Type I error.

Repeated Measures ANOVA

Use When:

  • Comparing means at 3 or more time points
  • Same subjects measured repeatedly
  • Continuous outcome
  • Longitudinal data

Example: Comparing functional scores at baseline, 3 months, 6 months, 12 months post-op.

Advantage: Accounts for within-subject correlation, more powerful.

Assumption: Sphericity - variance of differences between all pairs of time points is equal (Mauchly test).

If Sphericity Violated: Apply Greenhouse-Geisser or Huynh-Feldt correction.

Alternative: Mixed-effects model (more flexible, handles missing data better).

Repeated measures ANOVA is essential for longitudinal orthopaedic outcome studies.

Tests for Categorical Outcomes

Chi-Square Test

Use When:

  • Comparing proportions between 2 or more groups
  • Categorical outcome
  • Independent observations

Example: Comparing complication rates (yes/no) across 3 surgical techniques.

Null Hypothesis: No association between variables (proportions are equal across groups).

Requirement: Expected count greater than 5 in each cell of contingency table.

  • If violated: Use Fisher exact test (exact p-value, no expected count requirement).

Chi-Square Interpretation

Chi-Square vs Fisher Exact

TestWhen to UseAdvantageLimitation
Chi-squareExpected count greater than 5 in all cellsFaster, widely availableInaccurate with small sample or low expected counts
Fisher exactANY sample size, especially expected count under 5Exact p-value, no assumptions about expected countsComputationally intensive for large tables

Clinical Example: Comparing infection rates (categorical outcome) between smokers and non-smokers.

Understanding when to use chi-square vs Fisher exact prevents incorrect p-values.

Tests for Associations and Relationships

Correlation

Use When: Assessing strength and direction of relationship between 2 continuous variables.

Pearson Correlation (r)

Use When:

  • Both variables continuous
  • Linear relationship
  • Bivariate normal distribution

Range: r = -1 to +1

  • r = +1: Perfect positive correlation
  • r = 0: No correlation
  • r = -1: Perfect negative correlation

Interpretation:

  • r = 0.0 to 0.3: Weak correlation
  • r = 0.3 to 0.7: Moderate correlation
  • r = 0.7 to 1.0: Strong correlation

Example: Correlation between age and functional score after THA.

Key Point: Correlation does NOT imply causation - confounders may explain association.

Pearson correlation is the most common for linear relationships.

Spearman Correlation (rho)

Use When:

  • Ordinal data (ranked)
  • Non-normal distribution
  • Non-linear but monotonic relationship

Advantage: Non-parametric - no normality assumption.

Example: Correlation between pain scores (ordinal 1-10) and function scores.

Interpretation: Same as Pearson (range -1 to +1).

Use Spearman when Pearson assumptions are violated.

Regression

Linear Regression

Use When:

  • Modeling relationship between continuous outcome and predictor(s)
  • Predicting outcome value based on predictors

Simple Linear Regression: 1 predictor

  • Equation: Y = a + b×X
  • b (slope): Change in Y for 1-unit increase in X

Multiple Linear Regression: 2 or more predictors

  • Equation: Y = a + b₁×X₁ + b₂×X₂ + ...
  • Adjusts for confounders: Each coefficient is adjusted for other variables

Example: Predicting functional score based on age, BMI, comorbidities.

Assumptions:

  • Linear relationship
  • Normal distribution of residuals
  • Homoscedasticity (constant variance of residuals)
  • Independence of observations

Interpretation: Coefficient represents change in outcome per unit change in predictor.

Multiple regression allows adjustment for confounders in observational studies.

Logistic Regression

Use When:

  • Binary outcome (yes/no, success/failure)
  • Modeling probability of outcome based on predictors

Example: Predicting probability of complication (yes/no) based on age, smoking, diabetes.

Output: Odds Ratio (OR)

  • OR greater than 1: Increased odds of outcome with predictor increase
  • OR less than 1: Decreased odds of outcome
  • OR = 1: No association

Interpretation: OR = 2.5 means 2.5 times higher odds of outcome for each unit increase in predictor.

Advantage: Handles multiple predictors, adjusts for confounders, models non-linear relationships with binary outcomes.

Logistic regression is essential for modeling complication risk in orthopaedics.

Test Selection Guide

Choosing Statistical Tests

Outcome TypeNumber of GroupsPaired or IndependentTest
Continuous (normal)2 groupsIndependentIndependent t-test
Continuous (normal)2 groupsPairedPaired t-test
Continuous (non-normal)2 groupsIndependentMann-Whitney U test
Continuous (non-normal)2 groupsPairedWilcoxon signed-rank test
Continuous (normal)3+ groupsIndependentOne-way ANOVA
Continuous (normal)3+ groupsRepeated measuresRepeated measures ANOVA
Continuous (non-normal)3+ groupsIndependentKruskal-Wallis test
Categorical2+ groupsIndependentChi-square or Fisher exact

Anatomy of Statistical Tests

Components of a Statistical Test

Test Statistic

The calculated value from your data:

  • t-statistic (t-tests)
  • F-statistic (ANOVA)
  • Chi-square statistic (χ²)
  • Z-score (large samples)

Interpretation:

  • Larger absolute values = more extreme result
  • Compared against critical value or used to calculate p-value

Degrees of Freedom (df)

Number of independent values:

  • t-test: df = n₁ + n₂ - 2
  • Paired t-test: df = n - 1
  • Chi-square: df = (rows-1) × (columns-1)
  • ANOVA: df between groups, df within groups

Impact:

  • Affects critical value threshold
  • More df = narrower confidence intervals

P-value

Probability of obtaining result if null is true:

  • p less than 0.05: conventionally "significant"
  • p less than 0.01: highly significant
  • p less than 0.001: very highly significant

Common misinterpretations:

  • NOT probability that null is true
  • NOT probability that result is due to chance

Confidence Interval

Range containing true population parameter:

  • 95% CI: 95% confidence true value is within range
  • If 95% CI excludes null value → significant at p less than 0.05
  • Width indicates precision

More informative than p-value alone:

  • Shows magnitude and precision
  • Aids clinical interpretation

Understanding these components allows proper interpretation of statistical test results.

Mathematical Foundations

T-test Formula

Independent samples t-test:

t = (x̄₁ - x̄₂) / SE(difference)

Where SE = √[(s₁²/n₁) + (s₂²/n₂)]

Paired t-test:

t = d̄ / (SD_d / √n)

Where d̄ = mean of differences

Chi-square Formula

Pearson chi-square:

χ² = Σ [(O - E)² / E]

Where:

  • O = Observed frequency
  • E = Expected frequency

Expected frequency: E = (row total × column total) / grand total

Effect Size Metrics

Common Effect Size Measures

MeasureUseInterpretationFormula
Cohen's dContinuous outcomes (t-test)0.2 small, 0.5 medium, 0.8 large(Mean₁ - Mean₂) / SD pooled
Odds RatioBinary outcomes (logistic)1 = no effect, over 1 = increased odds(a/b) / (c/d) in 2×2 table
Risk RatioBinary outcomes (cohort)1 = no effect, over 1 = increased risk(a/(a+b)) / (c/(c+d))
Eta-squared (η²)ANOVA effect size0.01 small, 0.06 medium, 0.14 largeSS_between / SS_total
r (correlation)Linear relationship0.1 small, 0.3 medium, 0.5 largeCovariance / (SD_x × SD_y)

Effect Size Importance

Statistical significance ≠ Clinical significance:

  • With large n, tiny differences can be "significant"
  • Report effect size alongside p-values
  • MCID (Minimal Clinically Important Difference) is more relevant than p-value
  • Example: WOMAC improvement of 2 points may be significant (p less than 0.05) but not clinically meaningful (MCID = 8-12 points)

Understanding the mathematics behind tests enables better interpretation and troubleshooting.

Classification

Classification of Statistical Tests

Test Selection by Data Type and Design

Outcome Type2 Groups (Independent)2 Groups (Paired)3+ Groups
Continuous (normal)Independent t-testPaired t-testOne-way ANOVA
Continuous (non-normal)Mann-Whitney UWilcoxon signed-rankKruskal-Wallis
Categorical (2×2)Chi-square or FisherMcNemar testChi-square
OrdinalMann-Whitney UWilcoxon signed-rankKruskal-Wallis
Time-to-eventLog-rank testN/ALog-rank test

Parametric vs Non-Parametric Classification

Parametric Tests

Assume underlying distribution (usually normal):

  • Independent samples t-test
  • Paired t-test
  • One-way ANOVA
  • Two-way ANOVA
  • Pearson correlation
  • Linear regression

When to use:

  • Continuous data
  • Normal distribution (or large n)
  • Equal variance across groups

Non-Parametric Tests

No distribution assumptions:

  • Mann-Whitney U (rank-sum)
  • Wilcoxon signed-rank
  • Kruskal-Wallis (H test)
  • Friedman test
  • Spearman correlation

When to use:

  • Ordinal data
  • Small sample sizes
  • Skewed distributions
  • Outliers present

Proper test classification ensures appropriate test selection for your research question.

Advanced Test Classification

Multivariable Analysis Methods

Outcome TypeMethodUse CaseKey Output
ContinuousLinear regressionMultiple predictors of continuous outcomeβ coefficients, R²
BinaryLogistic regressionPredictors of yes/no outcomeOdds ratios, AUC
CountPoisson regressionPredictors of count dataRate ratios
Time-to-eventCox regressionSurvival analysis with predictorsHazard ratios
OrdinalOrdinal logisticOrdered categorical outcomeCumulative OR

Repeated Measures Classification

Repeated Measures ANOVA

For multiple time points (parametric):

  • Same subjects measured repeatedly
  • Accounts for within-subject correlation
  • Sphericity assumption (Mauchly test)
  • If sphericity violated: Greenhouse-Geisser correction

Example: ROM at 6 weeks, 3 months, 6 months, 1 year

Mixed Models

Flexible approach (modern):

  • Handles missing data better
  • No sphericity assumption
  • Can include time-varying covariates
  • Accounts for clustering (e.g., surgeon, hospital)

Preferred for complex longitudinal designs

Special Test Classifications

Agreement and Reliability Tests

PurposeTestScale TypeInterpretation
Inter-rater reliabilityCohen's kappaCategorical0.6-0.8 substantial, over 0.8 excellent
Intra-rater reliabilityICC (intraclass correlation)Continuous0.75-0.9 good, over 0.9 excellent
Agreement (continuous)Bland-AltmanContinuous95% limits of agreement
Internal consistencyCronbach's alphaScale itemsover 0.7 acceptable

Test Selection Algorithm

Step-by-step approach:

  1. What is your outcome variable type? (continuous/categorical/time-to-event)
  2. How many groups are you comparing? (2 or 3+)
  3. Are observations independent or paired/repeated?
  4. Is data normally distributed? (parametric or non-parametric)
  5. Do you need to adjust for confounders? (multivariable analysis)
  6. Is there clustering in your data? (mixed models)

Advanced classification enables selection of optimal statistical methods for complex study designs.

Clinical Applications

Understanding statistical tests allows clinicians to critically appraise orthopaedic literature and make evidence-based decisions. Key applications include:

  • Evaluating treatment outcomes: Comparing surgical vs conservative management
  • Assessing prognostic factors: Identifying predictors of complications
  • Quality improvement: Analyzing registry data for benchmarking
  • Research design: Selecting appropriate tests for study protocols

Diagnostic Test Statistics

Evaluating Diagnostic Tests

Sensitivity

True positive rate:

  • Proportion of diseased correctly identified
  • Formula: TP / (TP + FN)
  • High sensitivity = few false negatives
  • "Rules OUT disease when negative" (SnNOut)

Example: If sensitivity = 95%, 5% of cases will be missed

Specificity

True negative rate:

  • Proportion of non-diseased correctly identified
  • Formula: TN / (TN + FP)
  • High specificity = few false positives
  • "Rules IN disease when positive" (SpPIn)

Example: If specificity = 90%, 10% will be false alarms

Positive Predictive Value

If test positive, probability of disease:

  • Formula: TP / (TP + FP)
  • Depends on disease prevalence
  • Higher PPV with higher prevalence

Clinical meaning: "My patient tested positive - how likely are they to actually have it?"

Negative Predictive Value

If test negative, probability of no disease:

  • Formula: TN / (TN + FN)
  • Also depends on prevalence
  • Higher NPV with lower prevalence

Clinical meaning: "My patient tested negative - how confident am I they're disease-free?"

2×2 Contingency Table

Disease PresentDisease Absent
Test PositiveTrue Positive (TP)False Positive (FP)PPV = TP/(TP+FP)
Test NegativeFalse Negative (FN)True Negative (TN)NPV = TN/(TN+FN)
Sens = TP/(TP+FN)Spec = TN/(TN+FP)

Diagnostic test statistics are essential for evaluating imaging studies and clinical tests.

Advanced Diagnostic Metrics

Likelihood Ratios

Positive LR (LR+):

  • Formula: Sensitivity / (1 - Specificity)
  • LR+ greater than 10: Strong evidence for disease
  • LR+ 5-10: Moderate evidence
  • LR+ 2-5: Weak evidence

Negative LR (LR-):

  • Formula: (1 - Sensitivity) / Specificity
  • LR- less than 0.1: Strong evidence against disease
  • LR- 0.1-0.2: Moderate evidence

ROC Curves and AUC

Receiver Operating Characteristic:

  • Plots sensitivity vs (1-specificity) at all thresholds
  • AUC = Area Under Curve (0.5 to 1.0)
  • AUC 0.9-1.0: Excellent discrimination
  • AUC 0.8-0.9: Good
  • AUC 0.7-0.8: Fair
  • AUC less than 0.7: Poor

Use: Comparing diagnostic tests, choosing optimal cut-off

Pre-test and Post-test Probability

Bayesian Approach to Diagnosis:

  • Pre-test probability: Prior likelihood based on clinical suspicion
  • Post-test probability: Updated probability after test result
  • Formula: Post-test odds = Pre-test odds × Likelihood Ratio
  • Clinical judgment + test result = better decision making

Example: If pre-test probability is 50% and LR+ = 10:

  • Pre-test odds = 1:1
  • Post-test odds = 10:1
  • Post-test probability = 91%

SnNOut and SpPIn

Sensitivity and Specificity Mnemonics:

  • SnNOut: Highly Sensitive test, Negative result rules OUT disease (low FN rate means negatives are reliable)
  • SpPIn: Highly Specific test, Positive result rules IN disease (low FP rate means positives are reliable)

Clinical application:

  • Use sensitive test for screening (don't miss cases)
  • Use specific test for confirmation (don't label incorrectly)

Advanced diagnostic statistics enable evidence-based interpretation of clinical tests and imaging findings.

Performing Statistical Analysis

Step-by-Step Analysis Workflow

Step 1: Define Research Question
  • Formulate null and alternative hypotheses
  • Identify outcome variable(s)
  • Identify predictor/exposure variable(s)
  • Determine comparison type (difference, association, prediction)
Step 2: Examine Your Data
  • Check data type (continuous, categorical, ordinal)
  • Assess distribution (histogram, Q-Q plot)
  • Identify outliers and missing data
  • Check for data entry errors
Step 3: Select Appropriate Test
  • Use DINGO mnemonic
  • Match test to data type and design
  • Choose parametric or non-parametric
  • Consider confounders (multivariable analysis)
Step 4: Check Assumptions
  • Normality (Shapiro-Wilk, Q-Q plot)
  • Equal variance (Levene test)
  • Independence of observations
  • Sample size adequacy
Step 5: Perform and Report
  • Run analysis in software (SPSS, R, Stata)
  • Report test statistic, df, p-value
  • Include effect size and confidence interval
  • Present results clearly (tables, figures)

Software Options

Common Statistical Software

SoftwareCostLearning CurveBest For
SPSSExpensiveEasyBeginners, basic analyses
RFreeSteepAdvanced users, custom analyses
StataModerateModerateEpidemiology, panel data
ExcelCommonEasySimple calculations only
SASExpensiveSteepClinical trials, pharma

Following a systematic approach ensures rigorous and reproducible statistical analysis.

Assumption Checking in Detail

Testing Normality

Visual methods:

  • Histogram: Should be bell-shaped
  • Q-Q plot: Points follow diagonal line

Statistical tests:

  • Shapiro-Wilk (n less than 50): Best power
  • Kolmogorov-Smirnov (n greater than 50)
  • Note: With large n, minor deviations can be "significant"

Rule of thumb: Visual inspection often more informative than tests

Handling Violations

If normality violated:

  • Use non-parametric alternative
  • Transform data (log, square root)
  • Bootstrap confidence intervals

If equal variance violated:

  • Use Welch t-test (does not assume equal variance)
  • Report robust standard errors
  • Consider non-parametric tests

Power and Sample Size

Power Analysis

Components of power calculation:

  • Effect size (expected difference)
  • Alpha level (usually 0.05)
  • Power (usually 0.80)
  • Sample size

Know 3 to calculate the 4th:

  • A priori: Calculate sample size needed
  • Post-hoc: Calculate achieved power

Common Power Scenarios

Rule of thumb estimates:

  • t-test (medium effect d=0.5): n=64 per group
  • Chi-square 2×2 (medium effect w=0.3): n=88 total
  • Correlation (medium effect r=0.3): n=84 total

Underpowered studies:

  • Miss true effects (Type II error)
  • Overestimate effect sizes if significant
  • Consider in sample size planning

Common Analysis Errors to Avoid:

  • Running multiple t-tests instead of ANOVA
  • Not checking assumptions before parametric tests
  • Ignoring clustering in data (e.g., multiple joints per patient)
  • P-hacking: Testing multiple outcomes until one is significant
  • Dichotomizing continuous variables (loses information)
  • Confusing statistical and clinical significance

Reporting Statistics

CONSORT/STROBE reporting standards:

  • Report exact p-values (not just "p less than 0.05")
  • Include 95% confidence intervals for estimates
  • Report effect sizes (Cohen's d, OR, HR)
  • State statistical software and version
  • Describe handling of missing data
  • Pre-specify primary outcome and analysis plan

Rigorous methodology and transparent reporting are essential for high-quality research.

Practical Examples in Orthopaedics

Common Orthopaedic Research Scenarios

Comparing Two Treatment Groups

Scenario: Cemented vs uncemented THA outcomes

Outcome: Harris Hip Score (continuous, 0-100) Test: Independent samples t-test If non-normal: Mann-Whitney U test

Example result: "Mean HHS was 85.2 (SD 12.1) in cemented vs 87.4 (SD 11.8) in uncemented group (t=-1.42, df=98, p=0.16)"

Before-After Comparison

Scenario: Knee ROM before vs after TKA

Outcome: ROM in degrees (continuous) Design: Same patients at 2 time points Test: Paired t-test If non-normal: Wilcoxon signed-rank test

Example result: "ROM improved from 92° (SD 18) to 115° (SD 12), mean difference 23° (95% CI: 18-28, p less than 0.001)"

Comparing Multiple Groups

Scenario: Pain scores across 4 fracture types

Outcome: VAS pain score (continuous) Groups: 4 fracture classifications Test: One-way ANOVA Post-hoc: Tukey or Bonferroni correction

Example result: "Significant difference in VAS between groups (F=5.23, df=3,96, p=0.002). Post-hoc: Type D higher than Types A,B (p less than 0.05)"

Association Between Categories

Scenario: Smoking status and nonunion rate

Outcome: Nonunion yes/no (categorical) Exposure: Smoker/non-smoker (categorical) Test: Chi-square test (or Fisher exact if expected less than 5)

Example result: "Nonunion rate was 15% in smokers vs 5% in non-smokers (χ²=6.8, df=1, p=0.009)"

These examples demonstrate common statistical scenarios in orthopaedic research.

Multivariable Analysis Examples

Linear Regression Example

Research question: Predictors of WOMAC score after TKA

Outcome: WOMAC score (continuous) Predictors: Age, BMI, preop pain, comorbidities

Model interpretation:

  • Each year of age: -0.3 WOMAC points (95% CI: -0.5 to -0.1)
  • Each unit BMI: -1.2 WOMAC points (95% CI: -1.8 to -0.6)
  • R² = 0.35 (model explains 35% of variance)

Conclusion: BMI is strongest predictor of poorer outcome

Logistic Regression Example

Research question: Risk factors for surgical site infection

Outcome: SSI yes/no (binary) Predictors: Diabetes, smoking, operative time, ASA grade

Model interpretation:

  • Diabetes: OR 2.4 (95% CI: 1.3-4.4, p=0.005)
  • Smoking: OR 1.8 (95% CI: 1.0-3.2, p=0.048)
  • Each hour operative time: OR 1.3 (95% CI: 1.1-1.5, p=0.001)

Conclusion: Diabetes has highest independent risk

Survival Analysis Example

Kaplan-Meier Analysis

Research question: Implant survivorship comparison

Outcome: Time to revision (censored data) Comparison: Implant A vs Implant B Test: Log-rank test

Result:

  • 10-year survivorship: A = 92%, B = 88%
  • Log-rank χ² = 4.2, p = 0.041

Conclusion: Significant difference in survival curves

Cox Regression Example

Research question: Predictors of THA revision

Outcome: Time to revision Predictors: Age, sex, fixation type, diagnosis

Result:

  • Age less than 55: HR 1.8 (95% CI: 1.3-2.5)
  • Inflammatory arthritis: HR 2.1 (95% CI: 1.4-3.2)
  • Uncemented: HR 0.8 (95% CI: 0.6-1.1, NS)

Conclusion: Young age and inflammatory diagnosis increase revision risk

Registry Data Analysis

Analysing AOANJRR data example:

  • Survival analysis with Kaplan-Meier curves
  • Comparison with log-rank test
  • Cox regression for adjusted hazard ratios
  • Account for competing risks (death before revision)
  • Report cumulative percent revision with 95% CI

Key consideration: Large sample sizes mean tiny differences can be "significant" - focus on clinical importance

Practical examples help translate statistical concepts into research application.

Common Errors and Pitfalls

Statistical Errors to Avoid

Type I Error (False Positive)

Definition: Rejecting null hypothesis when it is true

Causes:

  • Multiple comparisons without correction
  • P-hacking (testing until p less than 0.05)
  • Selective outcome reporting

Prevention:

  • Pre-specify primary outcome
  • Use Bonferroni or FDR correction for multiple tests
  • Register study protocol before data collection

Type II Error (False Negative)

Definition: Failing to reject null when it is false

Causes:

  • Underpowered study (sample too small)
  • High variability in data
  • Small true effect size

Prevention:

  • Conduct a priori power calculation
  • Aim for power greater than 80%
  • Use sensitive outcome measures

Wrong Test Selection

Common mistakes:

  • Using t-test when ANOVA needed (multiple comparisons)
  • Using parametric test with skewed data
  • Using independent test when data is paired
  • Using chi-square when expected counts less than 5

Prevention:

  • Follow decision tree (DINGO mnemonic)
  • Check assumptions before analysis
  • Consult statistician if unsure

Assumption Violations

Ignoring assumptions leads to:

  • Biased p-values
  • Invalid confidence intervals
  • Unreliable conclusions

Must check:

  • Normality (Q-Q plot, Shapiro-Wilk)
  • Equal variance (Levene test)
  • Independence (study design)
  • Adequate sample size

Awareness of these errors helps avoid common statistical mistakes.

Advanced Pitfalls

Multiple Comparisons Problem

The problem:

  • Testing 20 outcomes at α=0.05 → expect 1 false positive
  • "P-hacking" inflates false positive rate

Correction methods:

  • Bonferroni: α' = α/n (conservative)
  • Holm: Step-down procedure (less conservative)
  • FDR: Controls false discovery rate (for many tests)

Best approach: Pre-specify single primary outcome

Confounding and Bias

Confounding: Third variable explains observed association Selection bias: Non-random sampling affects results

Solutions:

  • Randomization (RCT)
  • Multivariable adjustment (regression)
  • Propensity score matching
  • Stratification

Key concept: Association ≠ Causation

Interpretation Errors

Common Misinterpretations

MistakeWhat People ThinkWhat It Actually Means
p=0.04 means...4% chance null is true4% chance of result if null true
95% CI means...95% chance true value in range95% of such intervals contain true value
Significant result...Effect is large/importantEffect unlikely due to chance alone
Non-significant...No effect existsInsufficient evidence to reject null
p=0.001 vs p=0.04First effect is largerOnly tells us about evidence, not magnitude

Red Flags in Published Research:

  • Multiple outcomes with only one "significant"
  • Post-hoc subgroup analyses driving conclusions
  • No sample size justification
  • Failure to report effect sizes
  • Incomplete reporting of non-significant results
  • Mismatch between study aims and primary outcome

Critical Appraisal Checklist

When reading orthopaedic papers, check:

  1. Was sample size justified with power calculation?
  2. Was the primary outcome pre-specified?
  3. Were appropriate tests used for data type?
  4. Were assumptions checked and reported?
  5. Are effect sizes and CIs reported (not just p-values)?
  6. Is clinical significance discussed?
  7. Are limitations acknowledged?

Critical statistical thinking is essential for evidence-based practice.

Reporting and Publishing Results

Reporting Statistical Results

CONSORT Guidelines

Randomized Controlled Trials:

  • Report participant flow diagram
  • State sample size calculation
  • Report all outcomes - primary and secondary
  • Include confidence intervals for main results
  • Report actual p-values (not just p less than 0.05)

Key requirements:

  • Intention-to-treat analysis
  • Report losses to follow-up
  • Baseline characteristics table

STROBE Guidelines

Observational Studies:

  • Clear statement of study design
  • Describe setting, dates, eligibility
  • Report numbers at each stage
  • Report outcome data with denominators
  • Address confounding

Required elements:

  • Case-control, cohort, cross-sectional clearly stated
  • Bias assessment
  • Sensitivity analyses

Presenting Statistical Results

Result TypeReporting FormatExample
Continuous outcomesMean (SD) or median (IQR)Pain score: 3.2 (SD 1.4)
Proportionsn (%) with denominator23/50 (46%) achieved union
Risk comparisonRR or OR with 95% CIRR 0.65 (95% CI 0.48-0.88)
Time-to-eventHR with 95% CI, survival curveHR 0.72 (95% CI 0.55-0.94)
P-valuesExact value to 2-3 decimal placesp = 0.034 (not p less than 0.05)

Exam Pearl

FRACS Viva Point: "What must be reported alongside any p-value?" Answer: The effect size (difference between groups) and 95% confidence interval - p-values alone do not indicate clinical importance or precision of the estimate.

Proper statistical reporting enables readers to evaluate findings and enables future meta-analyses.

Journal-Specific Requirements

JBJS Requirements

Statistical reporting checklist:

  • Sample size justification required
  • Primary outcome pre-specified
  • Multiplicity adjustments for multiple comparisons
  • Missing data handling explained
  • Effect size with confidence intervals mandatory

Level of Evidence assignment:

  • Study design determines initial level
  • Quality assessment can modify level

JAAOS/BJJ Standards

Similar requirements:

  • CONSORT/STROBE checklist submission
  • Statistical analysis plan recommended
  • Reporting of all pre-specified outcomes
  • Negative results published appropriately

Common rejection reasons:

  • Inadequate sample size justification
  • Missing confidence intervals
  • Multiple testing without correction

Advanced Reporting Considerations

IssueProblemCorrect Approach
Subgroup analysesMultiplicity inflation, false positivesPre-specify, limit number, report interaction tests
Per-protocol vs ITTSelection bias if only per-protocolReport both; ITT is primary analysis
Missing dataCan bias results either directionReport missing data rate, use multiple imputation
Interim analysesIncreased Type I error riskAlpha spending functions, clear stopping rules
Sensitivity analysesDemonstrate robustnessReport how results change with different assumptions

Reproducibility and Open Science

Data Sharing

Current expectations:

  • Many journals require data availability statement
  • Individual patient data (IPD) increasingly requested
  • Code and analysis scripts sharing encouraged

Practical considerations:

  • De-identification requirements
  • Institutional review board considerations
  • Repository options (Dryad, Figshare)

Pre-registration

Trial registration:

  • ANZCTR for Australian trials
  • ClinicalTrials.gov for international
  • Pre-specify primary outcome and analysis plan

Benefits:

  • Prevents selective outcome reporting
  • Addresses publication bias
  • Required by ICMJE journals

Transparent reporting enables replication, meta-analysis, and advancement of orthopaedic evidence.

Interpreting Research Outcomes

Statistical vs Clinical Significance

Statistical Significance

Definition: P-value less than chosen alpha (usually 0.05)

What it tells you:

  • The difference is unlikely due to chance alone
  • Nothing about magnitude or clinical importance
  • Large samples can detect trivial differences

Common misinterpretation:

  • "Statistically significant" ≠ "important"
  • p = 0.04 is not much different from p = 0.06

Clinical Significance

Definition: Difference is large enough to change practice

Key concept - MCID:

  • Minimal Clinically Important Difference
  • Patient-centered threshold
  • Varies by outcome measure

Examples in orthopaedics:

  • VAS pain: 2 points (or 30% change)
  • WOMAC: 15 points
  • SF-36 Physical: 5 points

Key Outcome Measures

MeasureDefinitionInterpretation
Relative Risk (RR)Risk in exposed / Risk in unexposedRR = 2.0 means double the risk
Odds Ratio (OR)Odds in cases / Odds in controlsApproximates RR when outcome rare (less than 10%)
Absolute Risk Reduction (ARR)Control rate - Treatment rateActual percentage point reduction
Number Needed to Treat (NNT)1 / ARRPatients to treat to prevent 1 event
Hazard Ratio (HR)Instantaneous risk ratio over timeHR = 0.7 means 30% reduction in hazard

Exam Pearl

FRACS Viva Question: "A drug reduces DVT risk from 4% to 2%. What is the NNT?" Answer: ARR = 4% - 2% = 2% = 0.02. NNT = 1/0.02 = 50. You need to treat 50 patients to prevent one DVT.

Understanding both statistical and clinical significance is essential for evidence-based practice.

Confidence Interval Interpretation

What 95% CI Means

Correct interpretation:

  • Range where true population value likely lies
  • 95% of similarly-constructed intervals would contain the true value
  • Reflects precision of the estimate

Width determined by:

  • Sample size (larger = narrower)
  • Variability in data
  • Confidence level chosen

Clinical Application

Using CI for decision making:

  • If CI excludes clinically meaningful threshold → confident result
  • If CI is wide → need more data
  • If CI spans both benefit and harm → inconclusive

Example:

  • Mean difference: 5 points
  • 95% CI: 2 to 8 points
  • MCID: 5 points
  • CI includes values below MCID → may not be clinically important

Advanced Outcome Concepts

ConceptDescriptionApplication
Effect size (Cohen's d)Standardized difference in means0.2 small, 0.5 medium, 0.8 large
PowerProbability of detecting true effect80% standard, need higher for important outcomes
Precision medicineIndividual patient predictionForest plots for subgroup effects
Fragility indexNumber of events to change significanceLow fragility = fragile result
Responder analysisProportion exceeding MCIDMore meaningful than mean differences

Interpreting Negative Studies

True Negative

Evidence of no effect:

  • Adequate sample size (powered for MCID)
  • Confidence interval excludes clinically important difference
  • Can conclude treatments equivalent

Example:

  • 95% CI: -3 to +2 points
  • MCID: 5 points
  • Conclusion: No clinically important difference

Inconclusive

Cannot conclude equivalence:

  • Underpowered study
  • Wide confidence interval spanning MCID
  • "Absence of evidence ≠ Evidence of absence"

Example:

  • 95% CI: -8 to +12 points
  • MCID: 5 points
  • Conclusion: Cannot rule out important difference

Meta-Analysis Outcomes

Pooled Estimates

Interpreting meta-analysis results:

  • Pooled effect size (weighted average)
  • Heterogeneity (I² statistic)
  • I² greater than 50% = substantial heterogeneity
  • I² greater than 75% = considerable heterogeneity

When heterogeneous:

  • Pooled estimate may be misleading
  • Look for subgroup differences
  • Consider random effects model

GRADE Quality

Quality of evidence assessment:

  • High: Further research unlikely to change confidence
  • Moderate: May change estimate
  • Low: Likely to change estimate
  • Very low: Very uncertain

Downgrade factors:

  • Risk of bias, inconsistency, indirectness
  • Imprecision, publication bias

Sophisticated outcome interpretation enables nuanced clinical decision-making beyond simple p-values.

Evidence Base

Statistical Tests in Orthopaedic Research

5
Bhandari M, Montori VM, Devereaux PJ, et al • JBJS Am (2003)
Key Findings:
  • Survey of statistical methods in orthopaedic journals
  • t-tests and ANOVA most common for continuous outcomes
  • Chi-square most common for categorical outcomes
  • Many studies did not report checking parametric assumptions
  • Recommendation: Report tests used, assumptions checked, and justification
Clinical Implication: Surgeons should ensure appropriate test selection and report assumption checking to ensure valid results.
Limitation: Study from 2003 - statistical reporting has improved somewhat with CONSORT adoption.

Parametric vs Non-Parametric Tests

5
Vickers AJ • Journal of Clinical Epidemiology (2005)
Key Findings:
  • Non-parametric tests generally less powerful than parametric if assumptions met
  • Power loss is usually modest (5-10%) for non-parametric tests
  • When in doubt, use non-parametric - more robust to violations
  • For small samples (n under 30), normality testing unreliable - use non-parametric
  • For large samples (n greater than 100), parametric tests robust even if non-normal
Clinical Implication: Use non-parametric tests when normality uncertain, especially for small samples. Power loss is acceptable trade-off for robustness.
Limitation: Some non-parametric tests (e.g., Mann-Whitney) test different hypotheses than parametric equivalents.

Regression in Observational Studies

5
Groenwold RH, Klungel OH, Grobbee DE, Hoes AW • Journal of Clinical Epidemiology (2013)
Key Findings:
  • Regression allows adjustment for confounders in observational studies
  • Linear regression for continuous outcomes, logistic for binary
  • Cannot adjust for unmeasured confounders - residual confounding remains
  • Overfitting occurs when too many predictors relative to sample size
  • Rule of thumb: Minimum 10 events per variable in logistic regression
Clinical Implication: Regression is essential tool for observational research but cannot fully replace randomization for causal inference.
Limitation: Regression assumes correct model specification - misspecification leads to biased estimates.

Exam Viva Scenarios

Practice these scenarios to excel in your viva examination

VIVA SCENARIOStandard

Scenario 1: Test Selection

EXAMINER

"You are comparing functional scores between 3 different surgical approaches for rotator cuff repair. What statistical test would you use and why?"

EXCEPTIONAL ANSWER
For comparing functional scores - a continuous outcome - across 3 groups, I would use a one-way ANOVA, assuming the data meet parametric assumptions. The functional score is a continuous variable, and I am comparing 3 independent groups, so ANOVA is the appropriate omnibus test. Before running ANOVA, I would check the assumptions: First, normality - I would examine histograms and Q-Q plots for each group, possibly using Shapiro-Wilk test if sample size is small. Second, homoscedasticity or equal variance - I would use Levene test to check if variance is similar across the 3 groups. Third, independence - the patients in each surgical group should be independent. If ANOVA shows a significant result (p less than 0.05), this tells me that at least one group differs from the others, but it does NOT tell me which specific groups differ. I would then perform post-hoc pairwise comparisons using Tukey HSD test or Bonferroni correction to identify which groups differ while controlling for multiple comparisons. If the normality or equal variance assumptions are violated, I would use the non-parametric alternative: Kruskal-Wallis test, followed by Dunn test for post-hoc comparisons. I would NOT run multiple independent t-tests comparing each pair of groups because this would inflate the Type I error rate.
KEY POINTS TO SCORE
ANOVA for continuous outcome across 3+ groups
Check assumptions: normality, equal variance, independence
ANOVA is omnibus test - tells IF groups differ, not WHICH
Post-hoc tests (Tukey, Bonferroni) identify specific differences
Never run multiple t-tests - inflates Type I error
COMMON TRAPS
✗Suggesting t-tests for 3 groups (wrong - inflates Type I error)
✗Not mentioning assumption checking
✗Not explaining need for post-hoc tests after ANOVA
✗Not mentioning non-parametric alternative if assumptions violated
LIKELY FOLLOW-UPS
"What are the assumptions of ANOVA?"
"What is the difference between Tukey and Bonferroni post-hoc tests?"
"What would you do if the data are not normally distributed?"
VIVA SCENARIOChallenging

Scenario 2: Regression Interpretation

EXAMINER

"A study used logistic regression to identify predictors of nonunion after tibial fracture. Age had an odds ratio of 1.5 (95% CI 1.2 to 1.9, p = 0.001). What does this mean?"

EXCEPTIONAL ANSWER
This logistic regression output tells me that age is a significant independent predictor of nonunion risk. Let me interpret each component. First, the **odds ratio of 1.5** means that for every 1-year increase in age, the odds of developing nonunion are 1.5 times higher, or 50 percent increased odds, after adjusting for other variables in the model. Second, the **95% confidence interval of 1.2 to 1.9** tells me the plausible range for the true odds ratio. The entire CI is above 1.0, which indicates a statistically significant increase in risk - consistent with p = 0.001. The lower bound of 1.2 suggests at minimum a 20 percent increased odds, and the upper bound of 1.9 suggests up to 90 percent increased odds. Third, **p = 0.001** indicates this relationship is highly statistically significant, with less than 0.1 percent probability of observing this association by chance if age truly had no effect. Clinically, this means older patients are at higher risk for nonunion, and age should be considered when counseling patients and potentially when deciding on treatment strategies like augmentation with bone graft or biologics. Because this is from a logistic regression model with multiple predictors, the OR is adjusted for other variables like smoking, diabetes, fracture type - meaning the age effect is independent of these confounders. However, I would still be cautious about unmeasured confounding and would not assume causation from this observational study. To confirm causation, we would need experimental evidence or very strong observational data with careful confounder control.
KEY POINTS TO SCORE
OR = 1.5 means 50% increased odds per 1-year age increase
95% CI (1.2-1.9) above 1.0 indicates significant increased risk
p = 0.001 highly statistically significant
OR is adjusted for other predictors in the model
Caution: Observational data cannot prove causation
COMMON TRAPS
✗Confusing odds ratio with relative risk
✗Not explaining that OR is adjusted for other variables
✗Not interpreting confidence interval bounds
✗Claiming this proves causation (observational study)
✗Not mentioning clinical implications
LIKELY FOLLOW-UPS
"What is the difference between odds ratio and relative risk?"
"How would you interpret an odds ratio of 0.5?"
"What assumptions does logistic regression make?"

MCQ Practice Points

ANOVA vs Multiple t-tests

Q: Why should you NOT run multiple independent t-tests when comparing 3 or more groups? A: Inflates Type I error rate. Each t-test has 5% false positive risk. Three t-tests (Group 1 vs 2, 1 vs 3, 2 vs 3) inflate family-wise error to approximately 14%. ANOVA controls overall Type I error at 5%, then post-hoc tests with correction identify specific differences.

Paired vs Independent t-test

Q: When would you use a paired t-test instead of an independent t-test? A: When comparing same subjects at two time points (e.g., before vs after surgery). Paired t-test accounts for within-subject correlation and is more powerful. Independent t-test is for comparing two separate groups of different subjects.

Chi-square Expected Count Rule

Q: When should you use Fisher exact test instead of chi-square? A: When expected count is under 5 in any cell of the contingency table. Chi-square approximation is inaccurate with small expected counts. Fisher exact provides exact p-value for any sample size.

Correlation Coefficient Interpretation

Q: How do you interpret the Pearson correlation coefficient r = 0.7? A: Strong positive linear relationship. r = 0.7 means 49% of variance in one variable is explained by the other (r² = 0.49). Clinical significance depends on context. Interpretation: r under 0.3 = weak, 0.3-0.7 = moderate, over 0.7 = strong. Note: correlation does not imply causation.

Non-parametric Test Selection

Q: How do you decide between parametric and non-parametric tests? A: Use non-parametric tests when: (1) data violate normality assumption (check with Shapiro-Wilk test), (2) ordinal data (e.g., Likert scales), (3) small sample size where normality cannot be verified, (4) extreme outliers present. Non-parametric tests are more robust but less powerful.

Australian Context

Australian Research Framework

NHMRC Guidelines

National Health and Medical Research Council:

  • National Statement on Ethical Conduct in Human Research
  • Mandatory ethics approval for human research
  • Australian Code for Responsible Conduct of Research

Key requirements:

  • Human Research Ethics Committee (HREC) approval
  • Informed consent documentation
  • Data management plans
  • Reporting adverse events

ANZCTR Registration

Australian New Zealand Clinical Trials Registry:

  • www.anzctr.org.au
  • Mandatory for clinical trials
  • Required before participant enrollment
  • ICMJE requirement for publication

Registration includes:

  • Primary and secondary outcomes
  • Sample size calculation
  • Statistical analysis plan

Australian Orthopaedic Data Sources

ResourceTypeApplication
AOANJRRRegistryJoint replacement outcomes nationally
ACSQHCQuality standardsClinical care standards, indicators
AIHWHealth statisticsNational injury and disease data
Medicare/PBS dataAdministrativeProcedure rates, medication use
State trauma registriesRegistryVictoria, NSW trauma outcomes

Exam Pearl

FRACS Viva Point: "What is the level of evidence of AOANJRR data?" Answer: Level III (retrospective cohort) - but with very high validity due to near-complete capture (greater than 98%) and validated data linkage for revision endpoints.

Australian trainees should be familiar with national research infrastructure and ethics requirements.

Australian Statistical Support

Biostatistics Support

University biostatistics units:

  • Most universities offer consulting services
  • NHMRC grants often require biostatistician involvement
  • Early involvement improves study design

Key centres:

  • NHMRC Clinical Trials Centre (Sydney)
  • Murdoch Children's Research Institute (Melbourne)
  • QIMR Berghofer (Brisbane)
  • University biostatistics departments

Funding Bodies

Australian research funding:

  • NHMRC: Project Grants, Ideas Grants
  • MRFF (Medical Research Future Fund)
  • AOA Research Foundation
  • State health department grants

Statistical requirements:

  • Sample size justification mandatory
  • Analysis plan required for competitive grants
  • DSMB for larger trials

AOANJRR Statistical Methods

Kaplan-Meier Estimates

Primary analysis method:

  • Cumulative percentage revision at time points
  • 95% confidence intervals reported
  • Stratified by patient, implant, hospital factors

Interpreting registry reports:

  • Revision rate per 100 observed component years
  • Hazard ratios for implant comparisons
  • Funnel plots for hospital variation

Risk Adjustment

Controlling confounders:

  • Propensity score matching for implant comparisons
  • Cox proportional hazards regression
  • Competing risk analysis (death vs revision)

Registry limitations:

  • No patient-reported outcomes (PROMs)
  • Limited clinical detail
  • Cannot determine causation

Australian Publication Requirements

JournalStatistical RequirementsNotes
ANZ Journal of SurgeryCONSORT/STROBE checklistWelcomes Australian registry studies
JOA (JBJS APAC)Full statistical analysis planRegional orthopaedic focus
MJAStrong methodology standardsRequires biostatistician review
International journalsSame global standardsAOANJRR studies highly regarded

FRACS Research Requirements

Training Requirements

Research pathway:

  • Research methods course (basic statistics)
  • Published paper or thesis for completion
  • Understanding of levels of evidence

Exam knowledge:

  • Interpret study designs and statistics
  • Critically appraise literature
  • Apply evidence to clinical scenarios

AOA Research Support

Available resources:

  • AOA Research Foundation grants
  • Access to AOANJRR data for approved projects
  • Statistical support through collaborative networks

Application process:

  • Research proposal review
  • Ethics requirements
  • Data access agreements for registry studies

Australian orthopaedic research benefits from strong registry infrastructure and growing statistical expertise.

COMMON STATISTICAL TESTS

High-Yield Exam Summary

Tests for Continuous Outcomes

  • •2 groups, independent, normal = Independent t-test
  • •2 groups, paired (before-after), normal = Paired t-test
  • •2 groups, independent, non-normal = Mann-Whitney U
  • •2 groups, paired, non-normal = Wilcoxon signed-rank
  • •3+ groups, independent, normal = One-way ANOVA + post-hoc
  • •3+ groups, repeated measures, normal = Repeated measures ANOVA
  • •3+ groups, independent, non-normal = Kruskal-Wallis

Tests for Categorical Outcomes

  • •Comparing proportions, expected count over 5 = Chi-square
  • •Comparing proportions, expected count under 5 = Fisher exact
  • •Binary outcome with predictors = Logistic regression
  • •Multiple categorical outcomes = Chi-square or multinomial regression

Tests for Associations

  • •Correlation between 2 continuous, normal = Pearson correlation (r)
  • •Correlation between 2 variables, non-normal or ordinal = Spearman correlation (rho)
  • •Predicting continuous outcome from predictors = Linear regression
  • •Predicting binary outcome from predictors = Logistic regression (OR)
  • •Correlation does NOT imply causation - confounders may explain

Critical Test Selection Rules

  • •Match test to outcome type (continuous vs categorical)
  • •Check normality before parametric tests (histogram, Q-Q plot, Shapiro-Wilk)
  • •Use paired tests for before-after, independent for separate groups
  • •ANOVA first for 3+ groups, then post-hoc (never multiple t-tests)
  • •Fisher exact when expected count under 5 (not chi-square)

Interpretation Principles

  • •ANOVA tells IF groups differ, post-hoc tells WHICH groups
  • •Pearson r: 0-0.3 weak, 0.3-0.7 moderate, 0.7-1.0 strong correlation
  • •Logistic regression OR greater than 1 = increased odds, OR less than 1 = decreased odds
  • •Regression coefficients are adjusted for other variables in model
  • •Non-parametric tests less powerful but more robust to violations

Common Mistakes

  • •Multiple t-tests instead of ANOVA (inflates Type I error)
  • •Independent t-test for paired data (loses power)
  • •Chi-square with expected count under 5 (inaccurate p-value)
  • •Not checking parametric assumptions before using t-test or ANOVA
  • •Confusing correlation with causation (observational data)
Quick Stats
Reading Time161 min
Related Topics

Articular Cartilage Structure and Function

Bending Moment Distribution in Fracture Fixation

Biceps Femoris Short Head Anatomy

Biofilm Formation in Orthopaedic Infections