INFORMATION REPOSITORY

03. Confidence Intervals

Updated on January 3, 2025
In the previous lesson we learned how repeated measurements of a sample allow us to calculate a mean value \bar{x} as an approximation of the true mean \mu_{0}. Yet given the uncertainty in these measurements, it is unlikely that \bar{x} will be precisely equal to \mu_{0}. If 100 repeated measurements cannot give us the true answer, then what can? In this lesson we will learn to formulate a range of values, or confidence interval, that is very likely to include the true value \mu_{0}.

Learning Goals #

  • Understand how the sample size affects your ability to draw conclusions from your data.
  • Calculate the confidence interval for your data.
  • Estimate the optimal number of replicates for an experiment. 
  • Argue why it is reasonable to assume that experimental data follows the normal distribution.
  • Outline strategies to increase the degrees of freedom to improve a confidence interval. 
Analytical Separation Science by B.W.J. Pirok and P.J. Schoenmakers
READ SECTIONS 9.2.3 - 9.2.3.2

Confidence Intervals & Central Limit Theorem

1. Sampling distribution of mean #

We turn back to the table of 100 RPLC measurements of a pesticide from the previous lesson. These are now shown here on this page in Table 1. At this point it is time to acknowledge that the 100 repetitions are far, far away from conventional laboratory practice. Indeed, usually we would conduct a much smaller number of repeats.

Table 1. Representation of a sample of 100 repeated measurements of the concentration of a pesticide by RPLC. Identical to Table 9.2 from the book.
SET\bar{x}\sigma
#19.089.139.119.139.109.139.159.129.149.109.1190.0213
#29.129.159.139.149.119.139.119.139.139.129.1270.0125
#39.119.099.119.149.119.129.159.149.169.149.1270.0221
#49.139.149.169.089.149.109.149.099.129.139.1230.0254
#59.169.129.129.119.159.139.179.129.159.119.1340.0217
#69.129.109.119.139.129.179.119.149.119.129.1230.0200
#79.109.139.149.129.119.139.169.129.139.159.1290.0179
#89.129.149.139.129.149.139.129.139.139.129.1280.0079
#99.139.129.139.139.099.189.139.119.149.119.1270.0236
#109.129.109.159.119.149.129.109.149.139.149.1250.0178

The data in Table 1 can also be regarded as a collection of 10 datasets, one in each row. The final columns in the table display the mean, \bar{x}, and standard deviation \sigma for the data in each row. We can see that, even though all 100 measurements are in fact repetitions from the same measurement, the mean \bar{x} in each row is slightly different. This suggests that \bar{x} also follows a distribution of its own.

Things get much more interesting in Figure 1 where the distribution of the 10 \bar{x} values is plotted in a box and whisker plot (pink) next to those of the 100 original measurements (blue). Strikingly, the plots clearly show that the means are clustered much closer to one another. Can we somehow exploit this?

Figure 1. The distribution of sample means is much more narrow than the distribution of the original data pertaining to those sampled means.
Well, what we can say is that if we were to add an infinite number of sample means, we would obtain the sampling distribution of the mean. It’s property is that its mean \mu_{\bar{x}} is equal to the mean of the original population \mu. Similarly, its standard deviation \sigma_{\bar{x}} relates to the standard deviation of the original population \sigma through
Equation 9.13: \sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}}
In other words, when systematic errors are absent, the sampling distribution of the mean gives a good indication about the true mean \mu_{0}.

2. Central limit theorem #

A second property of the sampling distribution of means is that it follows the normal distribution due to the central limit theorem. The central limit theorem states that the distribution of means obtained from random samples of a population will follow the normal distribution, regardless of the shape of the original distribution. This is depicted in Figure 2.
Figure 2. Histogram plot of the results after following a single die 1000 times.
Figure 2 shows the resulting distribution of rolling a die 1000 times. We can see that sometimes we roll a 1, sometimes a 2, and so on. One could argue that the shape of this distribution is a block shape, with all six results having an equal probability to be encountered.

It gets interesting once we take several dice and roll them simultaneously. Figure 3 shows the resulting means obtained from 1000 times following 10 dice simultaneously. In other words, 10 dice were thrown, the mean was recorded, and the process was repeated until 1000 means were obtained. 

Figure 3. Histogram plot of the results after rolling a 10 dice and computing the mean \bar{x} of the outcome. This process was then repeated to account to a total number of 1000 times.

In Figure 3, the distribution no longer looks like a block shape at all, and in stead starts to resemble the normal distribution. In fact, if we would thrown an infinite number of dice 1000 times and plot the mean of the result, i.e. the sampling distribution of the means, then we would obtain a perfect normal distribution such as shown in Figure 4.

Figure 4. Histogram plot of the results after rolling an infinite number of dice and computing the mean \bar{x} of the outcome. This process was then repeated to account to a total number of 1000 times.
In practice, we often use data obtain from averaged measurements and it is therefore reasonable to assume that data obtained follows the normal distribution.
Analytical Separation Science by B.W.J. Pirok and P.J. Schoenmakers
READ SECTION 9.2.3.3

Confidence Intervals For Large Sample Sizes

3. Confidence intervals #

3.1. Large sample size #

We can use the above concepts to establish a confidence interval. The confidence interval quantifies the range which is likely to include the true mean \mu_{0}. This range depends on the significance level \alpha. For a 99% confidence interval \alpha is 0.01, for 95% it \alpha is 0.05, and so on.
For example, for a 95% confidence interval this means that 95% of the sample means will lie between \mu-z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}} and \mu+z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}. Here, z_{\frac{\alpha}{2}} is the critical value, or critical z-value, also written as z_{\text{crit}}. This critical value determines the range covered by the confidence interval (Figure 5, blue area).
Figure 5. The standard normal distribution has a mean of 0 and a standard deviation of 1. With these values, the center 95% of the data will be between z of -1.96 and 1.96 (calculated using the ICDF).

At this stage, you may overwhelmed by the computation of the critical value. However, determining this range is nothing different from the ICDF exercise in the previous lesson! We are interested in the central 95% of the distribution, which leaves 2.5% on both the left and right ends. We can use the ICDF at p 0.025 (2.5%) and 0.975 (97.5%) and obtain the z-values at these probabilities (-1.96 and 1.96, respectively).

In practice we will have a single sample for which we can calculate the mean \bar{x} (and not the entire sample distribution of means). Therefore, we can calculate the confidence interval for the mean \mu as

Equation 9.15: \bar{x}\pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}

In reality, \sigma is often unknown. As long as n is above 30 we can reasonably replace it by s, but we will numerically explore this at the end of this lesson.

Calculate the confidence interval for the 100 RPLC pesticide measurements in Table 1. Fill in the interval (i.e. z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}) in the field below and round it to 4 decimals. Hint: Try to do the exercise with a combination of the tools you have learned thus far in this course. An incorrect answer will give your more hints. Further example code is down the page, but you learn more if you first try yourself.

3.2. Small sample size #

It is great that we have been able to determine the confidence interval for our 100 RPLC measurements. However, in practice we often find ourselves with a much smaller number of repetitions. What do we do then?

Analytical Separation Science by B.W.J. Pirok and P.J. Schoenmakers
READ SECTIONS 9.2.3.4-9.2.3.5

Confidence Intervals For Small Sample Sizes

When the sample size is low, we need to account for the additional uncertainty. We can no longer replace \sigma by s and the data no longer follows the z standard normal distribution (Equation 9.10 from previous lesson). Instead, we use the t-distribution. Also known as the Student’s t distribution. It’s test statistic, t, is given by

Equation 9.17: t=\frac{\bar{x}-\mu}{s/\sqrt{n}}

The equation for the t-distribution is a bit more complex, and not relevant here (you can find it in the book). What is important is that where the normal distribution depends on \mu and \sigma as input parameters, the t-distribution requires  the degrees of freedom,\nu, to be specified. For the confidence interval, the degrees of freedom (DOF), \nu, are given by
Equation 9.18: \nu=n-1

The effect of the degrees of freedom is shown in Figure 6, where the t distribution is plotted for different \nu values. At \nu = \infty the t-distribution is identical to the normal distribution. However, once we lose degrees of freedom, the distribution widens. This becomes significant below n of 30, and is seen in Figure 6 to be extreme for degrees of freedom as low as 1 or 2.

Figure 6. Overlay of the t-distribution plotted for different numbers of degrees of freedom. As the number of degrees of freedom decreases, we get less confident about our the extend to which our data represents the true mean (in the absence of systematic errors).

The effect is that the critical values will also move away from 0, and as a consequence the confidence intervals will widen. This is in agreement with what we expect, because if we have less repetitions (i.e. less degrees of freedom), then we also are less sure about what we are measuring. If we only have two measurements (i.e. \nu = 1, then we have no clue what the actual variation is based on just two values and our confidence will be horrible. For small sample sizes, the confidence interval is given by

Equation 9.19: \mu=\bar{x} \pm t_{(\frac{\alpha}{2},\nu)} \frac{s}{\sqrt{n}}

Here, t_{(\frac{\alpha}{2},\nu)} is the critical t-value at \nu degrees of freedom. It can be computed with the ICDF using all programming languages and Excel.

Calculate the confidence interval for the 10 RPLC pesticide measurements from Set 1 in Table 1. Fill in the interval (i.e. t_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}) in the field below and round it to 4 decimals. Hint: Try to do the exercise with a combination of the tools you have learned thus far in this course. An incorrect answer will give your more hints. Further example material to help you is located further down the page, but you learn more if you first try yourself.

Example
				
					% Significance Level
p = 1-alpha/2;

% Calculation
t_crit = icdf('T',p,x_dof); 
x_range = t_crit*(x_std/sqrt(n));

% Confidence Interval
x_CI = [x_mean - x_range, x_mean + x_range];
				
			

The T.INV() function can be used to compute the ICDF for the t-distribution for a given probability and number of degrees of freedom.

				
					using Distributions, Statistics

# Significance level
p           = 1 - alpha / 2

# Define the T distribution with degrees of freedom
t_dist      = TDist(x_dof)

# Calculation
t_crit      = quantile(t_dist, p)
x_range     = t_crit * (x_std / sqrt(n))

# Confidence interval
x_CI        = [x_mean - x_range, x_mean + x_range]
				
			

4. Optimal replicate number #

We saw in Figure 6 how at some point the t-distribution with sufficient degrees of freedom strongly resembles the normal distribution. We mentioned that this point would be around n>30 where we can reasonably assume s to sufficiently represent \sigma. Can we use this to determine what number of repeated measurements we should do?

Well, Equation 9.19 shows that the confidence interval depends on several components: (i) n, (ii) the critical t-value, and (iii) the precision. The latter is a given and depends on the instrument that produces the data. However, the first two we can investigate more closely. This is done in Figure 7, where the critical t-value divided by \sqrt{n} is plotted against the sample size.

On a first glance we see from Figure 7 that a small number of measurements yields a very high critical value, which improves as n increases. We can also see that after 20-50 measurements, the improvement per measurement of the critical value diminishes. This is the origin of the n>30 (some say n>50). More importantly, going from two to five repetitions yields a 7-fold improvement! 

So, how many replicates of a measurement should we do? Definitely not just two, at least three, but five clearly yields a lot of confidence!

5. Increasing degrees of freedom #

Obviously, doing more replicates increases the degrees of freedom. But what if we find out later that we want more? 

In this case, what can be done is that an additional set of replicates can be measured. This is from a statistical sense a second separate dataset and requires some statistical testing. Combining both datasets can only be done if the variances of the two dataset are homogeneous and requires a test of heteroscedasticity. We will learn how to do this in future lessons.

If the outcome is that the variances are homogenous, then the standard deviations may be pooled

Equation 9.20: s^{2}_{\text{pooled}}=\frac{\nu_1 s^{2}_1 + \nu_2 s^{2}_2 + …}{\nu_1+\nu_2+…}

Here, \nu_1 and \nu_2 are the degrees of freedom of the respective datasets, and the same is the case for the variances.

If the variances are not homogenous, the Welch-Satterthwaite equation can be used, which is shown in Equations 9.21 and 9.22 in the book.

Concluding Remarks #

We can conclude that the repeated measurements allows us to formulate a range of values that – in the absence of systematic errors – is very likely to include the true mean. This is the confidence interval. We learned how this can be set at different significance levels. Other conclusions are:

  • Averaged measurements follow the normal distribution (central limit theorem), it is therefore not stupid to assume that our data is normally distributed.
  • Going from 2 to 5 repetitions in practice for our laboratory experiments really pays off in terms of confidence intervals. 
  • Above, say, 30 repetitions, we can reasonably assume that our standard deviation of the sample represents the standard deviation of the population.

In the next lesson we will use these concepts to do actual hypothesis tests.

Further help with exercises #

Confidence interval for large sample sizes
				
					% Data
data = readtable('PesticideConcentrations.csv');
vector_data = reshape(data{:,:},[],1);
a = 0.05;

% Gathering Information
p = 1-a/2; % 1-Alpha/2
x_mean = mean(vector_data);
x_std = std(vector_data);
n = length(vector_data);

% Calculation
z_crit = icdf('Normal',p,0,1);
x_range = z_crit*(x_std/sqrt(n));

% Confidence Interval
x_CI = [x_mean - x_range, x_mean + x_range];
				
			

The T.INV() function can be used to compute the ICDF for the t-distribution for a given probability and number of degrees of freedom.

An example can be downloaded here (CS_03_CI_Large, .XLSX).

				
					using Distributions, Statistics, CSV, DataFrames

# Data
data        = CSV.read("PesticideConcentrationsOutliers.csv", DataFrame)
vector_data = data.Data     # Data <-- should be the name of the column in the csv
a           = 0.05

# Gathering information
p           = 1 - a / 2     # 1-alpha/2
x_mean      = mean(vector_data)
x_std       = std(vector_data)
n           = length(vector_data)

# Calculation
dist        = Normal(x_mean, x_std)
z_crit      = quantile(dist, p)
x_range     = z_crit * (x_std / sqrt(n))

# Confidence interval
x_CI        = [x_mean - x_range, x_mean + x_range]
				
			
Confidence interval for small sample sizes
				
					% Data
x = [9.08, 9.13, 9.11, 9.13, 9.10, 9.13, ...
    9.15, 9.12, 9.14, 9.10];
a = 0.05;

% Gathering Information
n = length(x);
x_dof = n-1; % Degrees Of Freedom
x_mean = mean(x);
x_std = std(x);
p = 1-a/2; % 1-Alpha/2

% Calculation
t_crit = icdf('T',p,x_dof);
x_range = t_crit*(x_std/sqrt(n));

% Confidence Interval
x_CI = [x_mean - x_range, x_mean + x_range];
				
			

The T.INV() function can be used to compute the ICDF for the t-distribution for a given probability and number of degrees of freedom.

An example can be downloaded here (CS_03_CI_Small, .XLSX).

				
					using Distributions, Statistics
x           = [9.08, 9.13, 9.11, 9.13, 9.10, 9.13, 9.15, 9.12, 9.14, 9.10]
a           = 0.05

# Gathering information
n           = length(x)
x_dof       = n - 1         # Degrees of freedom
x_mean      = mean(x)
x_std       = std(x)
p           = 1 - a / 2     # a - alpha / 2

# Calculation
dist        = Normal(x_mean, x_std)
t_crit      = quantile(dist, p)
x_range     = t_crit * (x_std / sqrt(n))

# Confidence interest
x_CI        = [x_mean - x_range, x_mean + x_range]
				
			
Is this article useful?