OUTCOME MEASURES AND PROMs
Patient-Reported Outcomes | Measurement Properties | Clinical Application
Outcome Measure Types
Critical Must-Knows
- PROM: Patient-Reported Outcome Measure - patient completes without clinician interpretation. Captures patient perspective.
- MCID: Smallest change in score that patients perceive as meaningful benefit. Essential for clinical interpretation.
- Validity: Does the measure assess what it claims to assess? (content, construct, criterion validity)
- Reliability: Does the measure give consistent results? (test-retest, inter-rater, internal consistency)
- Responsiveness: Can the measure detect clinically meaningful change over time? (ceiling/floor effects)
Examiner's Pearls
- "SF-36 has 2 components: Physical (PCS) and Mental (MCS) - scored 0-100, higher is better
- "WOMAC assesses 3 domains: Pain, Stiffness, Function - scored 0-96, lower is better (or normalized 0-100)
- "DASH measures upper extremity disability - 0-100 scale, 0 = no disability
- "Floor/ceiling effects over 15% indicate measure may not detect worsening or improvement
Clinical Imaging
Imaging Gallery




Critical PROM Concepts
Why PROMs Matter
Patient-Centered Care: Surgeon assessment may not match patient experience. PROMs capture what matters to patients - pain, function, quality of life. Required for value-based care.
MCID is Essential
Clinical Significance: A statistically significant change (p less than 0.05) may not matter to patients. MCID defines meaningful improvement. Compare observed change to MCID, not just p-value.
Generic vs Specific
Trade-off: Generic (SF-36) allows comparison across conditions but less sensitive. Specific (WOMAC) highly sensitive to joint pathology but cannot compare to other joints.
Measurement Properties
Quality Assessment: Valid (measures what it claims), Reliable (consistent results), Responsive (detects change). Poor properties = unreliable conclusions.
At a Glance
Patient-Reported Outcome Measures (PROMs) capture the patient's perspective on pain, function, and quality of life without clinician interpretation. The MCID (Minimal Clinically Important Difference) defines the smallest change that patients perceive as meaningful—compare observed change to MCID, not just p-values. PROMs are classified as generic (SF-36, EQ-5D—compare across conditions), region-specific (DASH for upper limb, LEFS for lower limb), or joint/disease-specific (WOMAC for hip/knee, ODI for spine—most sensitive to pathology). Key measurement properties are validity (measures what it claims), reliability (consistent results), and responsiveness (detects change over time). Floor and ceiling effects greater than 15% indicate the measure cannot detect deterioration or improvement respectively.
VRRMeasurement Properties (PROM Quality)
Memory Hook:VRR your PROMs - Validity, Reliability, Responsiveness ensure high-quality outcome measurement!
SWANKCommon Orthopaedic PROMs by Region
Memory Hook:SWANK PROMs cover all major orthopaedic regions - memorize these for exams!
Overview and Introduction
What are PROMs?
Patient-Reported Outcome Measures (PROMs) are standardized, validated questionnaires that patients complete without clinician interpretation. They capture the patient perspective on health status, symptoms, function, and quality of life.
Why PROMs Matter:
- Patient-Centered Care: Surgeon assessment may not match patient experience
- Quantifies Subjective Outcomes: Pain, function, satisfaction cannot be objectively measured
- Value-Based Care: Payers increasingly link reimbursement to patient-reported outcomes
- Quality Improvement: Registries (AOANJRR) use PROMs to benchmark performance
- Research: Essential for clinical trials to demonstrate treatment efficacy
PROM vs Clinician-Reported Outcomes:
- PROMs capture what matters to patients (pain, daily activities, quality of life)
- Clinician measures (ROM, strength) important but may not correlate with patient satisfaction
- Best practice: Use both PROMs and objective measures
Principles of Outcome Measurement
Types of Outcome Measures
Generic PROMs
Purpose: Assess overall health status across any condition. Allow comparison between different diseases and populations.
SF-36 (Short Form-36 Health Survey)
Description: 36-item generic health status measure.
Domains (8 subscales):
- Physical Functioning
- Role Physical (work/activities due to physical health)
- Bodily Pain
- General Health
- Vitality (energy/fatigue)
- Social Functioning
- Role Emotional (work/activities due to emotional problems)
- Mental Health
Scoring:
- Each subscale: 0-100 (higher = better health)
- Physical Component Summary (PCS): Aggregate of physical domains
- Mental Component Summary (MCS): Aggregate of mental domains
MCID: Approximately 5 points for PCS and MCS.
Advantages: Population norms available, allows cross-disease comparison.
Limitations: Less sensitive to specific musculoskeletal pathology than joint-specific measures.
SF-36 is the most widely used generic PROM in orthopaedic research.
Joint-Specific PROMs
WOMAC (Western Ontario and McMaster Universities Arthritis Index)
Description: Most widely used PROM for hip and knee osteoarthritis.
Domains (24 items):
- Pain (5 items): Pain with various activities
- Stiffness (2 items): Morning and later-day stiffness
- Physical Function (17 items): Difficulty with daily activities
Scoring Options:
- Likert Scale: 0-4 per item, total 0-96 (lower = better)
- VAS: 0-100mm per item
- Often normalized to 0-100 scale (higher = better or lower = worse depending on version)
MCID: Approximately 10-15 points (on 100-point scale).
Advantages: Excellent validity and reliability for hip/knee OA, widely used in arthroplasty research.
Limitations: Designed for arthritis - less applicable to ligament injuries, fractures.
WOMAC is the gold standard for hip and knee arthroplasty outcome assessment.
Measurement Properties
Validity
Definition: Does the measure assess what it claims to assess?
Types of Validity
| Type | Definition | How to Assess | Example |
|---|---|---|---|
| Content Validity | Covers all relevant aspects of construct | Expert panel review, patient input | WOMAC includes pain, stiffness, function for OA |
| Construct Validity | Correlates with related measures, discriminates from unrelated | Correlation with similar PROMs (convergent), lack of correlation with dissimilar (discriminant) | WOMAC correlates with knee ROM (convergent) but not with mental health scores (discriminant) |
| Criterion Validity | Correlates with gold standard | Compare to established measure | New knee score correlates with WOMAC |
Reliability
Definition: Does the measure give consistent results when condition is stable?
Types of Reliability
| Type | Definition | How to Assess | Target |
|---|---|---|---|
| Test-Retest | Same result when repeated in stable patients | Intraclass Correlation Coefficient (ICC) | ICC greater than 0.70 |
| Inter-Rater | Different raters get same result | ICC for clinician-administered measures | ICC greater than 0.70 |
| Internal Consistency | Items within scale measure same construct | Cronbach alpha | Alpha 0.70 to 0.95 (too high suggests redundancy) |
Responsiveness
Definition: Can the measure detect clinically meaningful change over time?
Floor Effect: High proportion (over 15%) score at minimum (worst possible).
- Problem: Cannot detect worsening in these patients.
Ceiling Effect: High proportion score at maximum (best possible).
- Problem: Cannot detect improvement in these patients.
Responsiveness Index: Standardized Response Mean (SRM) or Effect Size.
- SRM greater than 0.8: Large responsiveness (good)
- SRM 0.5 to 0.8: Moderate responsiveness
- SRM less than 0.5: Small responsiveness (may miss change)
Understanding responsiveness prevents choosing measures that cannot detect improvement.
Minimal Clinically Important Difference (MCID)
What is MCID?
Definition: The smallest change in PROM score that patients perceive as beneficial and would mandate a change in management.
Purpose: Distinguish statistically significant from clinically meaningful change.
How MCID is Determined
Methods:
-
Anchor-Based: Compare PROM change to external anchor (patient global assessment)
- "Compared to before surgery, how would you rate your improvement: Much better, Better, Same, Worse?"
- Calculate MCID as mean change for "Better" group.
-
Distribution-Based: Use statistical thresholds (0.5 SD, Standard Error of Measurement)
- MCID = 0.5 × standard deviation
- Less clinically intuitive than anchor-based.
Clinical Application:
- If mean improvement = 8 points and MCID = 10 points → Improvement is statistically significant but NOT clinically meaningful.
- If 95% CI = 12 to 18 points and MCID = 10 → Entire CI exceeds MCID → Clinically meaningful improvement.
Always compare treatment effects to MCID, not just p-values.
Clinical Application and Relevance
Choosing the Right PROM
Joint-specific for sensitivity (WOMAC for THA). Generic for cross-disease comparison and population norms (SF-36). Use both when possible to capture joint-specific and overall health.
Interpreting PROM Data
Compare change to MCID, not just statistical significance. Check floor/ceiling effects - over 15% suggests measure may not detect change. Report mean change AND proportion exceeding MCID.
Registry Requirements
AOANJRR and many registries require PROMs. Pre-operative baseline and post-operative follow-up (1 year, 5 year). Allows benchmarking and quality improvement.
Value-Based Care
Payers increasingly link reimbursement to PROMs. Demonstrating patient-reported improvement justifies procedures. PROMs essential for value-based contracts.
Evidence Base
WOMAC Measurement Properties
- WOMAC developed and validated for hip and knee osteoarthritis
- 24-item questionnaire assessing pain, stiffness, physical function
- Excellent test-retest reliability (ICC greater than 0.90)
- Good construct validity (correlates with other arthritis measures)
- Responsive to change after arthroplasty
MCID for Common Orthopaedic PROMs
- Systematic review of MCID values for musculoskeletal PROMs
- SF-36 PCS: MCID approximately 5 points
- WOMAC: MCID 10-15% of scale (10-15 points on 100-point scale)
- VAS Pain: MCID 15-20mm on 100mm scale
- DASH: MCID 10-15 points
Floor and Ceiling Effects in PROMs
- Floor or ceiling effects over 15% considered problematic
- Indicates measure cannot detect worsening (floor) or improvement (ceiling)
- Reduces responsiveness and statistical power
- Should report floor/ceiling effects when validating PROMs
- Consider alternative measure if effects exceed 15%
Exam Viva Scenarios
Practice these scenarios to excel in your viva examination
Scenario 1: PROM Selection
"You are planning an RCT comparing cemented vs uncemented THA. What outcome measures would you use and why?"
Scenario 2: MCID Interpretation
"An RCT of 200 patients found that new rehab protocol improved WOMAC score by mean 8 points (95% CI 5 to 11 points, p = 0.001) compared to standard protocol. The MCID for WOMAC is 10 points. How do you interpret this result?"
MCQ Practice Points
PROM Types
Q: What is the difference between a generic PROM (SF-36) and a joint-specific PROM (WOMAC)? A: Generic PROMs assess overall health status across any condition, allow comparison between diseases and to population norms, but are less sensitive to specific joint pathology. Joint-specific PROMs are highly sensitive to pathology in a single joint but cannot compare across different joints or to general population.
MCID Importance
Q: Why is MCID important when interpreting PROM changes? A: MCID defines clinically meaningful change - the smallest improvement that patients perceive as beneficial. Statistically significant changes (p less than 0.05) may not exceed MCID and thus not be clinically important. Always compare observed change to MCID, not just p-value.
Floor and Ceiling Effects
Q: What is a ceiling effect and why does it matter? A: Ceiling effect occurs when high proportion (over 15%) of patients score at maximum (best possible score). This prevents the measure from detecting improvement in these patients and reduces responsiveness. Choose a different measure or add a more challenging domain if ceiling effects are problematic.
Validity vs Reliability
Q: What is the difference between validity and reliability? A: Validity = Does the measure assess what it claims to assess? (accuracy). Reliability = Does the measure give consistent results when repeated in stable patients? (precision). A measure can be reliable but not valid (consistently wrong), but cannot be valid without being reliable.
Test-Retest Reliability
Q: What ICC value indicates good test-retest reliability? A: ICC greater than 0.70 indicates acceptable reliability. ICC (Intraclass Correlation Coefficient) ranges 0-1. ICC greater than 0.90 is excellent, 0.70-0.90 is good, less than 0.70 is poor. This measures consistency when same patient completes PROM twice with stable condition.
Responsiveness Measures
Q: How is responsiveness quantified? A: Standardized Response Mean (SRM) or Effect Size. SRM = mean change / SD of change. SRM greater than 0.8 = large responsiveness (good), 0.5-0.8 = moderate, less than 0.5 = small (may miss clinically important change). Responsiveness is essential for detecting treatment effects.
Australian Context
The Australian Orthopaedic Association National Joint Replacement Registry (AOANJRR) systematically collects PROMs for hip and knee arthroplasty, providing benchmarking data across Australian hospitals. The registry uses the Oxford Hip Score (OHS) and Oxford Knee Score (OKS) as primary joint-specific outcome measures, with EQ-5D for health utility assessment.
The ACSQHC (Australian Commission on Safety and Quality in Health Care) Clinical Care Standards increasingly incorporate PROM collection as quality indicators. Pre-operative and post-operative PROM collection at standardized intervals (baseline, 6 months, 1 year, 5 years) allows meaningful comparison across institutions and surgeons.
Australian validation studies have confirmed the measurement properties (validity, reliability, responsiveness) of commonly used PROMs in the Australian population, supporting their use in clinical practice and research. The AOANJRR PROM data demonstrates that contemporary arthroplasty procedures achieve mean improvements exceeding the MCID for the majority of patients.
Management Algorithm

OUTCOME MEASURES AND PROMs
High-Yield Exam Summary
Common Orthopaedic PROMs
- •Generic: SF-36 (PCS/MCS, 0-100, higher better), EQ-5D (utility 0-1)
- •Hip/Knee: WOMAC (pain/stiffness/function, 0-96 or 0-100, lower or higher better depending on version)
- •Upper Extremity: DASH (0-100, 0 = no disability), QuickDASH (11 items)
- •Shoulder: ASES (0-100, higher better), Constant score
- •Spine: ODI (Oswestry 0-100%, lower better), NDI (Neck Disability)
MCID Values
- •SF-36 PCS/MCS: MCID approximately 5 points
- •WOMAC: MCID 10-15 points (on 100-point scale)
- •DASH: MCID 10-15 points
- •VAS Pain: MCID 15-20mm (on 100mm scale)
- •Always compare treatment effect to MCID for clinical significance
Measurement Properties
- •Validity = Does it measure what it claims? (content, construct, criterion)
- •Reliability = Consistent results? (test-retest ICC greater than 0.70, Cronbach alpha 0.70-0.95)
- •Responsiveness = Detects change? (SRM greater than 0.8 = large, less than 15% floor/ceiling effects)
- •Floor effect = Too many at minimum (cannot detect worsening)
- •Ceiling effect = Too many at maximum (cannot detect improvement)
PROM Selection
- •Joint-specific for sensitivity (WOMAC for THA trial)
- •Generic for cross-disease comparison and population norms (SF-36)
- •Utility measure for cost-effectiveness (EQ-5D)
- •Use combination: Joint-specific (primary) + Generic (secondary)
- •Check floor/ceiling effects (over 15% problematic)
Interpreting PROM Data
- •Compare mean change to MCID, not just p-value
- •Check if 95% CI excludes MCID threshold
- •Report proportion of patients achieving MCID
- •Wide CI crossing MCID = uncertain clinical significance
- •Large sample with trivial effect (below MCID) = not clinically important
Clinical Application
- •AOANJRR and registries require PROMs (baseline and follow-up)
- •Value-based care links reimbursement to PROM improvement
- •Statistical significance ≠Clinical significance
- •Generic vs Specific trade-off: Comparison vs Sensitivity
- •Pre-specify primary PROM and timing in study protocol