By Stef De Reys, PhD, expert-trainer of Practical Statistics Course for Medical Affairs: Interpretation & Application
In Medical Affairs, we constantly review trials comparing a new therapy (“NEW”) to the current standard ("STANDARD”). And we all know the temptation: p < 0.05 looks like a green light.
But a p-value is not a stamp of truth. It only tells you how surprising the data would be if there were actually no real difference between NEW and STANDARD. It does not tell you whether the effect is big, clinically meaningful, or reliable.
Misinterpreting p-values can lead to flawed clinical decisions, regulatory issues, and misleading communications. Here’s how to avoid the most common mistakes and ensure your interpretation is compliant and meaningful.
A p-value tells you how likely it is to see the results you observed (or more extreme ones) if there were actually no real effect. In other words, it measures how surprising your data would be if the treatment or intervention made no difference. A very small p-value means the result would be a rare event if there were no effect. Therefore, look at a small p-value as evidence against the idea that the treatment made no difference.
Many teams treat p < 0.05 as a green light and p ≥ 0.05 as a red light. That’s tempting because it’s simple, but reality is more nuanced.
So instead of using the p-value as a verdict, use it as one clue about whether the observed difference is likely to be more than random variation.
To make an informed decision, we have to go a step further than that.
Medical Affairs professionals often rely on the “holy trinity” of data interpretation:
These three elements are closer to telling us the whole story. Whether an effect is statistically significant, how robust the study is, and whether the effect is clinically meaningful.
However, to make truly informed decisions and get rid of uncertainties, you need the confidence interval (CI) as well.
From your data, add the confidence interval (CI) — the lens that integrates effect size and sample size, and frames the uncertainty around your estimate.
A narrow CI signals precision, while a wide CI warns of uncertainty, even if the p-value looks impressive.
In short, think of CIs as the context that transforms numbers into actionable insight.
Not necessarily. p-values depend on sample size (N) and effect size (ES). A small p-value does not guarantee clinical importance.
✅ Better: Report p, N, ES, and confidence intervals (CI). Then ask: Is it clinically meaningful?
Avoid black-and-white thinking. p=0.06 is close to 0.05 and may still indicate benefit or harm, especially with small samples.
✅ Better: Treat “not significant” as “inconclusive.” Use confidence intervals for a fuller picture.
Increasing N can make tiny effects look “highly significant”. The effect size hasn’t changed, only the power.
✅ Better: With very small p and very large N, check if the effect size is clinically relevant.
A high p-value doesn’t confirm equivalence or non-inferiority. Your study may be underpowered.
✅ Better: Define ES and sample size upfront. Only then can a weak p-value support equivalence.
Running many tests inflates false positives. If you run 100 tests at p=0.05, expect 5 "significant” results just by chance.
✅ Better: Predefine hypotheses and number of tests. Apply corrections (e.g. Bonferroni).
p-values depend on the right statistical test for your data and design. Wrong test = wrong conclusion.
✅ Better: Check data distribution, missing values, and apply the correct test.
p-values are not the end-all-be-all. Combine them with sample size, effect size, confidence intervals, and study design for fair, compliant, and meaningful communication.