13 August, 2022

Understanding sample size calculation

Link
Crowd

Everything you always wanted to know about sample size calculation

In the mind of all clinicians, study director in pharma industry or CEO of biotech, the sample size calculation is one of the most dreaded steps in starting a clinical trial.

This step is crucial both methodologically and financially, and not taking it seriously can lead to the ultimate failure of the study.

Here we present the basic knowledge that you should know about sample size in clinical trials.

Why sample size is so important for the study preparation ?

Sample size should be calculated when preparing the study protocol, as it helps determine whether the study is feasible, ethical and scientifically sound.

With a small sample size, there is an increased risk that observations are due to chance, or even that you will fail to meet the primary endpoint due to low statistical power.

On the other hand, a study with too large a sample size runs the risk of detecting tiny or small effects, not clinically meaningful, that may not be important or relevant to improving human health.

So what is a good sample size?

The correct sample size is the appropriate number of individual ensuring that the study will yield reliable information, regardless of whether the data ultimately suggests a statistically significant difference between the interventions or elements being studied.

There are more than 80 sample size formulas, and you can’t just pick any random formula for your study.

Unfortunately, many researchers end up using a sample size calculator without understanding exactly how it works.

Descriptive or Analytical ?

A clinical study can include descriptive as well as analytical objectives.

A difference between the two is that the descriptive objective doesn`t have any comparative group, so it doesn`t required testing the significance of hypothesis, whereas analytical objective required hypothesis.

For example, “The prevalence of particular disease in South Africa” or “Estimate the diagnostic accuracy of diagnostic tool” are descriptive objectives, while “Comparison of efficacy between drug and placebo” or “Association between two clincial attributes”   etc are analytical objectives.

Before calculating the sample size, researcher should be clear about the type of study design and study objectives, because the sample size calculation formula is strongly related to both of these criteria.

The 4 basic statistical terms affecting the sample size calculation

1) Population size (n)

The population size in terms of research is how many patients fit your study population.

For example, if you want to get information on HIV-positive cases in North America, your population size is basically the total number of HIV Positive cases in North America.

But don’t worry! Your population size doesn’t always have to be that big!

Smaller population sizes can still give you accurate estimates as long as you know who you’re trying to represent.

2) Margin of error/precision (l)

It is a random sampling error, which is a likelihood of sample results variation from the population.

Let’s take an example:

Suppose the prevalence of Hypertension in a study sample is 40% and we set margin of error as 5%

It means that the range of Hypertension in the population would be between 40% ± 5%, i.e., 35% and 45% prevalence of Hypertension

3) Z-score and Confidence level

Z-score is a standard normal value at specific reliability level (or confidence level), which is the most important part in the estimation of the population`s true value.

The confidence level refers to the percentage of all possible samples that can be expected including true value. Researcher can usually consider the level of significance either 90% or 95% or 99%.

In other words, it means that, if 95% confidence interval is selected, 95 out of 100 samples will have the “true” population value within the range of precision specified earlier.

Each confidence interval has a specific related Z-score. You can find the correspondance in Z-score tables.

4) The Prevalence/Proportion (P)

It is the expected prevalence of a disease or a proportion of outcomes estimated from a previous study or pilot survey.

When estimating the sample size needed for a study, we need to have a reference proportion on which to base our calculation.

Precision-based sample size determination for categorical endpoints

As mentioned earlier, descriptive studies do not compare different groups and the concept of power and hypothesis are not applicable.

In the case of categorical endpoint (e.g. binary response, presence/absence of disease), the objective of the study is the measure of a proportion or prevalence.

Example :

A researcher prepared a proposal for a cross-sectional study in “Diabetes Mellitus (DM) among HIV–infected individuals in follow-up care at university of Gondar Hospital, Northwest Ethiopia”.

The objective of the study is to assess the prevalence of DM and associated factors among HIV–infected individuals in the Gondar Hospital.

What is the number of individuals the researcher should include in his study and how to justify it?

To calculate the sample size for cross-sectional study, when the outcome study variable is qualitative (which is measured as proportion/prevalence), according to Cochran`s formula [1], the researcher needs the following information:

  • P – The proportion of DM among the HIV patients reported in previous studies/pilot surveys
  • Q – The value Q = 1- P
  • L – The allowable error or Absolute error
  • Z – The Z-score for a level of confidence of 95% (Z=1.96)

Now, how to write the justification of the Sample Size in manuscript / proposal of study?

Let’s see how the researcher should write in the protocol section for sample size, the justification based on his collected information :

The sample size of our study, considering a prevalence of DM among the HIV patients of 6.4% (estimated in a previous study), at a level of confidence of 95% and a precision of 3%, is estimated by the formula:

[katex] \LARGE n= \frac{Z^2PQ}{L^2} [/katex]

In our study :

  • Z=1.96 to reach a 95% Confidence Interval
  • P = 0.64, related to the 6.4% of the probability of DM in the previous study
  • L=0.03 for the precision of 3% requested

 

So the sample size required for our study will be :

[katex] \LARGE n= \frac{1.96^2*0.64*(1-0.64)}{0.03^2} =256 [/katex]

We will need to enroll 256 patients in our study to estimate with an 95% Confidence Interval of +/- 3% the proportion of DM patients in HIV population

Now that you have a better understanding of how to write a sample size justification for a categorical criterion, you need to learn how to explain a sample size for a continuous endpoint, such as biomarker level, clinical characteristics, etc.

This issue will be addressed in a future post..

References

1. Cochran, W.G. (1963) Sampling Technique. 2nd Edition, John Wiley and Sons Inc., New York.

Want to learn more.. ?

Sign up here to enjoy new blog articles about biostatistics, clinical data analytics,  and stat programming

Most Read Articles

More for you:

Share your thoughts :