Making the right decisions from p-values – avoiding common pitfalls


By Stef De Reys, PhD, expert-trainer of Practical Statistics Course for Medical Affairs: Interpretation & Application

 

In Medical Affairs, we constantly review trials comparing a new therapy (“NEW”) to the current standard ("STANDARD”). And we all know the temptation: p < 0.05 looks like a green light.

But a p-value is not a stamp of truth. It only tells you how surprising the data would be if there were actually no real difference between NEW and STANDARD. It does not tell you whether the effect is big, clinically meaningful, or reliable.

Misinterpreting p-values can lead to flawed clinical decisions, regulatory issues, and misleading communications. Here’s how to avoid the most common mistakes and ensure your interpretation is compliant and meaningful.
 

Why p-values matter


A p-value tells you how likely it is to see the results you observed (or more extreme ones) if there were actually no real effect. In other words, it measures how surprising your data would be if the treatment or intervention made no difference. A very small p-value means the result would be a rare event if there were no effect. Therefore, look at a small p-value as evidence against the idea that the treatment made no difference.
 

Why p-values can mislead when used as a stamp


Many teams treat p < 0.05 as a green light and p ≥ 0.05 as a red light. That’s tempting because it’s simple, but reality is more nuanced.

  • A p-value is continuous: p = 0.049 and p = 0.051 are practically the same strength of evidence, yet the ‘stamp” flips.
  • A p-value depends on sample size. Bigger studies can make tiny differences look "significant.” Smaller studies can miss real differences.

So instead of using the p-value as a verdict, use it as one clue about whether the observed difference is likely to be more than random variation.

To make an informed decision, we have to go a step further than that.
 

The holy trinity


Medical Affairs professionals often rely on the “holy trinity” of data interpretation: 

  • p-value
  • sample size, N
  • effect size, ES

These three elements are closer to telling us the whole story. Whether an effect is statistically significant, how robust the study is, and whether the effect is clinically meaningful. 

However, to make truly informed decisions and get rid of uncertainties, you need the confidence interval (CI) as well.
 

Beyond the holy trinity: adding the confidence lens


From your data, add the confidence interval (CI) — the lens that integrates effect size and sample size, and frames the uncertainty around your estimate. 

A narrow CI signals precision, while a wide CI warns of uncertainty, even if the p-value looks impressive.

In short, think of CIs as the context that transforms numbers into actionable insight.
 

Wrapping it up: 6 pitfalls to avoid


1. “p < 0.05 proves statistical significance. Does it prove clinical relevance?”

Not necessarily. p-values depend on sample size (N) and effect size (ES). A small p-value does not guarantee clinical importance.

✅ Better: Report p, N, ES, and confidence intervals (CI). Then ask: Is it clinically meaningful?
 

2. “p = 0.06 means there’s no effect.”

Avoid black-and-white thinking. p=0.06 is close to 0.05 and may still indicate benefit or harm, especially with small samples.

✅ Better: Treat “not significant” as “inconclusive.” Use confidence intervals for a fuller picture.
 

3. “smaller p means bigger effect.”

Increasing N can make tiny effects look “highly significant”. The effect size hasn’t changed, only the power.

✅ Better: With very small p and very large N, check if the effect size is clinically relevant.
 

4. “No significant difference proves equivalence.”

A high p-value doesn’t confirm equivalence or non-inferiority. Your study may be underpowered.

✅ Better: Define ES and sample size upfront. Only then can a weak p-value support equivalence.
 

5. Ignoring multiplicity.

Running many tests inflates false positives. If you run 100 tests at p=0.05, expect 5 "significant” results just by chance.

✅ Better: Predefine hypotheses and number of tests. Apply corrections (e.g. Bonferroni).
 

6. Using the wrong test.

p-values depend on the right statistical test for your data and design. Wrong test = wrong conclusion.

✅ Better: Check data distribution, missing values, and apply the correct test.
 

Your 15-second p-value triage 

 

  • Low p-value: Check effect size & 95% CI for clinical meaning.
  • High p-value: Check sample size and desired power.
  • Credibility: Use the right test; quote exact p-values. Only use “p<” when p becomes extremely small.
  • Multiple comparisons: Apply corrections.
  • Avoid binary thinking: p-values indicate strength of evidence, not “go/no go.”
     

Bottom line


p-values are not the end-all-be-all. Combine them with sample size, effect size, confidence intervals, and study design for fair, compliant, and meaningful communication.
 

 

 

Subscribe to CELforPharma's newsletter to receive tips & insights from our expert faculty:

 

Please indicate your domains of interest:

Interested in domains

By subscribing to this newsletter, I accept the Privacy Policy.