Determining Sample Size

A major factor determining the length of a confidence interval is the size of the sample used in the estimation procedure.

Learning Objective

Assess the most appropriate way to choose a sample size in a given situation

Key Points

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample.
The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample.
In practice, the sample size used in a study is determined based on the expense of data collection and the need to have sufficient statistical power.
Larger sample sizes generally lead to increased precision when estimating unknown parameters.

Terms

law of large numbers
The statistical tendency toward a fixed ratio in the results when an experiment is repeated a large number of times.
central limit theorem
The theorem that states: If the sum of independent identically distributed random variables has a finite variance, then it will be (approximately) normally distributed.
Stratified Sampling
A method of sampling that involves dividing members of the population into homogeneous subgroups before sampling.

Full Text

Sample size, such as the number of people taking part in a survey, determines the length of the estimated confidence interval. Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample.

In practice, the sample size used in a study is determined based on the expense of data collection and the need to have sufficient statistical power. In complicated studies there may be several different sample sizes involved. For example, in a survey sampling involving stratified sampling there would be different sample sizes for each population. In a census, data are collected on the entire population, hence the sample size is equal to the population size. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

Sample sizes may be chosen in several different ways:

expedience, including those items readily available or convenient to collect (choice of small sample sizes, though sometimes necessary, can result in wide confidence intervals or risks of errors in statistical hypothesis testing)
using a target variance for an estimate to be derived from the sample eventually obtained
using a target for the power of a statistical test to be applied once the sample is collected

Larger sample sizes generally lead to increased precision when estimating unknown parameters. For example, if we wish to know the proportion of a certain species of fish that is infected with a pathogen, we would generally have a more accurate estimate of this proportion if we sampled and examined 200, rather than 100 fish. Several fundamental facts of mathematical statistics describe this phenomenon, including the law of large numbers and the central limit theorem.

In some situations, the increase in accuracy for larger sample sizes is minimal, or even non-existent. This can result from the presence of systematic errors or strong dependence in the data, or if the data follow a heavy-tailed distribution.

Sample sizes are judged based on the quality of the resulting estimates. For example, if a proportion is being estimated, one may wish to have the 95% confidence interval be less than 0.06 units wide. Alternatively, sample size may be assessed based on the power of a hypothesis test. For example, if we are comparing the support for a certain political candidate among women with the support for that candidate among men, we may wish to have 80% power to detect a difference in the support levels of 0.04 units.

Calculating the Sample Size $n$

If researchers desire a specific margin of error, then they can use the error bound formula to calculate the required sample size. The error bound formula for a population proportion is:

$\displaystyle \text{EBP} = z_{\frac{\alpha}{2}}\sqrt{\frac{p'q'}{n}}$

Solving for $n$ gives an equation for the sample size:

$n=\frac{\left(z_{\frac{\alpha}{2}}\right)^2p'q'}{\text{EBP}^2}$

[ edit ]

Prev Concept

Level of Confidence

Confidence Interval for a Population Proportion

Next Concept