INFORMATION REPOSITORY

12. Calibration & Model Variance

Updated on March 6, 2026

The model obtained through least-squares regression depends on the data used to fit it. This means that almost every aspect of the model contains variation. This lesson extends least-squares regression by quantifying the uncertainty associated with the fitted model. We introduce the variance-covariance matrix of the regression coefficients, from which the standard errors and confidence intervals of the model coefficients are obtained. These uncertainties propagate into the predicted response, the experimental response, and ultimately into the estimated analyte concentration after model inversion. We derive each form of uncertainty for the straight-line case and illustrate the effect on calibration curves. 

Learning Goals #

  • Explain the purpose of the variance–covariance matrix and how it quantifies uncertainty in regression parameters.
  • Compute the standard errors and confidence intervals of the intercept and slope using the variance–covariance matrix.
  • Distinguish between the confidence interval of the predicted response and that of the experimental response, and compute each using the appropriate expressions.
  • Evaluate the uncertainty of a predicted concentration by inverting a calibration line and calculating the confidence interval in the sample concentration.
Analytical Separation Science by B.W.J. Pirok and P.J. Schoenmakers
READ SECTION 9.6.3.2

CONFIDENCE INTERVAL OF THE MODEL PARAMETERS

1. Model parameters #

When fitting a regression model, we not only want the best-fitting parameter values, but also an indication of how precise they are. The variance of a parameter reflects the uncertainty in its estimate. Small variances indicate strong support from the data, while large variances suggest the estimate could change noticeably with new measurements. Assessing this uncertainty is crucial for evaluating prediction reliability, constructing confidence intervals, and comparing models.

1.1. Variance-Covariance Matrix #

The variance–covariance matrix (often written \textbf{Σ}) is a matrix (hence the non-italic and bold format) that summarises the uncertainty and interdependence of the regression parameters.

Equation 9.80: \textbf{Σ}=\begin{bmatrix} s^2_{b_0} & \text{cov}(b_0,b_1) & \cdots & \text{cov}(b_0,b_m) \\  \text{cov}(b_1,b_0) & s^2_{b_1} & \cdots & \text{cov}(b_1,b_m) \\ \vdots &  \vdots & \ddots & \vdots \\ \text{cov}(b_m,b_0) & \text{cov}(b_m,b_1) & \cdots &  s^2_{b_m} \end{bmatrix}

  • Diagonal elements: the variances of the parameters (conceptually written as \sigma^2_{b_0}, \sigma^2_{b_1}, …). These indicate how precisely each parameter is estimated. Because we compute them from our sample, we use their estimated forms s^2_{b_0}, s^2_{b_1}, and so on.

  • Off-diagonal elements: covariances between pairs of parameters (e.g. \text{cov}(b_0,b_1)). These indicate whether two parameters tend to increase or decrease together during the fitting process, which is important when parameters are not independent (e.g. slope and intercept in straight-line regression).

CASE: FITTING A RETENTION MODEL

We will now continue with the retention data and results from the exercises from the previous lesson.

In this context, the variance-covariance matrix provides the foundation for computing standard errors, confidence intervals, and hypothesis tests for each parameter. It is estimated by

Equation 9.81: \hat{\textbf{Σ}}=s^2_e \left( \textbf{X}^{\text{T}} \cdot \textbf{X} \right)^{-1}

with the estimated error variance, s^2_e, given by

Equation 9.82: \text{MSE} = s^2_e = \frac{\Sigma^n_{i=1} (y_i-\hat{y}_i)^2}{n-(m+1)}

Here, n-(m+1) are the degrees of freedom, with n the number of datapoints (rows of the matrix \textbf{X}, as defined in Lesson 11) and m+1 the number of estimated parameters (columns of \textbf{X}). The quantity s^2_e is also known as the mean squared error (\text{MSE}) of the regression: it represents the average squared deviation between observations and model predictions, corrected for the number of fitted parameters. 

				
					SSE_model = sum((y - y_hat).^2);
DoF_model = size(X_matrix,1) - size(X_matrix,2);
MSE_model = SSE_model / DoF_model;

VC_matrix = MSE_model*pinv(X_matrix'*X_matrix);
				
			

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					SSE_model   = sum((y - y_hat).^2)
DoF_model   = size(X_matrix,1) - size(X_matrix,2)
MSE_model   = SSE_model / DoF_model

VC_matrix   = MSE_model*pinv(X_matrix'*X_matrix)
				
			

Equation 9.80 showed that as a property of the variance-covariance matrix, the variances of the model parameters are on the diagonal of the matrix. We can extract them by

				
					s_b_model = sqrt(diag(VC_matrix));
				
			

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					s_b_model   = (diag(VC_matrix)).^2
				
			

Note that we used the square root to turn the variances into standard deviations. We can now calculate the confidence intervals of the model parameters using the t distribution, similar to as we did in Lesson 3, by realizing that t_{\text{obs}}=(b_i-\beta_i)/s_{b_i}, so that:

Equation 9.85: \beta_i=b_i ± t_{\text{crit}} \cdot s_{b_i}

where t_{\text{crit}} is the critical t-value (i.e. t_{\alpha/2,(n-(m+1))} again using n-(m+1) degrees of freedom.

				
					alpha = 0.05;
t_crit = icdf('t',1-alpha/2,DoF_model);
CI_b_model=[b-t_crit*s_b_model b+t_crit*s_b_model];
				
			

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					using Distributions

alpha       = 0.05
t_crit      = quantile(TDist(DoF_model), 1 - alpha/2)
CI_b_model  = [b-t_crit*s_b_model b+t_crit*s_b_model]
				
			
EXERCISE 1: MODEL PARAMETERS

Calculate the 95% confidence interval for the model parameters of your retention model from Exercise 3 of Lesson 11. Report 4 decimals. Note that you are not asked to specify the limits, but the half width of the interval (e.g. 5 ± 1.01).

Analytical Separation Science by B.W.J. Pirok and P.J. Schoenmakers
READ SECTION 9.6.3.3

CONFIDENCE INTERVAL OF THE PREDICTED RESPONSE

2. Predicting data #

With our fitted model from the previous lesson

Equation 9.71: \hat{y}=b_0 + b_1x \rightarrow \ln{\hat{k}}={\ln{k_0}} -{S \varphi}

we can predict the value of \ln{k} (i.e. y) for any chosen value of \varphi (i.e. x), through

\hat{y}_{\text{0}}=f(x_0, \textbf{b})={\textbf{x}_0} \cdot {\textbf{b}}=\begin{bmatrix}1 & x_0 \end{bmatrix} \begin{bmatrix}b_0 \\ b_1 \end{bmatrix}
EXERCISE 2: PREDICTION

Predict the value of \ln{k} for \varphi = 0.28. Round to two decimals. Use the equation below.

Solution
				
					x0=[1 0.28]; % This Is Our Phi
y_hat0=x0*b; % The Predicted Lnk
				
			

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					x0          = [1 0.28]  # This Is Our Phi
y_hat0      = x0 * b    # The Predicted Lnk
				
			

3. Confidence Interval of Prediction #

3.1. Jacobian #

We previously established that both fitted model parameters have associated variances and covariances (from the variance–covariance matrix). Since our prediction \hat{y}_{\text{0}} depends directly on b_0 and b_1, their uncertainty naturally propagates to the predicted value.

In other words, we do not only want to estimate a \hat{y}_{\text{0}}, but also a standard error of the prediction at the value x_{\text{0}}, i.e. s^2_{\hat{y}_0}, which can be obtained through

Equation 9.86: s^2_{\hat{y}_0}=\textbf{J}_{\text{0}} \cdot \textbf{Σ} \cdot \textbf{J}^{\text{T}}_{\text{0}}

To use this equation, we need \textbf{J}_{\text{0}}, the Jacobian evaluated at the prediction point. The Jacobian is given by

Equation 9.87: \textbf{J}_{\text{0}} = \begin{bmatrix}{\frac{\delta f}{\delta_{b_0}} \Big\rvert_{x_\text{0}}} & {\frac{\delta f}{\delta_{b_1}} \Big\rvert_{x_\text{0}}} & \cdots & {\frac{\delta f}{\delta_{b_m}} \Big\rvert_{x_\text{0}}} \end{bmatrix}

What is a Jacobian?

The Jacobian is named after Carl Gustav Jacob Jacobi, a 19th-century mathematician who formalized matrix calculus. The Jacobian is a vector (or matrix) containing all partial derivatives of a function with respect to its parameters.

For our straight-line model, the Jacobian tells us how sensitive the predicted value \hat{y} is to changes in each of the model parameters (i.e. b_0 and b_1).

Think of the Jacobian as a change-meter. It measures how much the prediction would shift if each parameter were nudged slightly. Large Jacobian values signify that a model parameter has strong influence.

3.2. Confidence interval for prediction #

Section 9.6.3.3 details how the Jacobian is \textbf{J}_{\text{0}} = \begin{bmatrix}1 & x_{\text{0}}\end{bmatrix} = \textbf{x}^{\text{T}}_{\text{0}} (in other words \textbf{J}_{\text{0}} \equiv \textbf{x}^{\text{T}}_{\text{0}}). Combining this with Equation 9.81 allows us to rewrite Equation 9.86 as

Equation 9.88: s^2_{\hat{y}_0}=\textbf{x}_{\text{0}} \cdot s^2_e \left( \textbf{X}^{\text{T}} \cdot \textbf{X} \right)^{-1} \cdot \textbf{x}^{\text{T}}_{\text{0}}

This allows us to calculate the confidence interval for the predicted value \hat{y}_i through

Equation 9.90: y_i= \hat{y}_i ± t_{\text{crit}} \cdot s_{\hat{y}_i}

				
					s_y_hat  = sqrt(x0*VC_matrix*x0');
CI_y_hat = [y_hat0-t_crit*s_y_hat y_hat0+t_crit*s_y_hat];
				
			

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					s_y_hat     = (x0 * VC_matrix * x0').^2
CI_y_hat    = [y_hat0-t_crit*s_y_hat y_hat0+t_crit*s_y_hat]
				
			

3.3. Confidence interval for experimental data #

In addition to the uncertainty of the predicted response, we often consider the uncertainty of the experimental response, that is, how much a repeated measurement at x_{\text{0}} would vary. This variance includes both the uncertainty in the fitted model and the intrinsic measurement noise:

Equation 9.91: s_{y_0}^2 = s_e^2 \ \mathbf{x}_0 (\mathbf{X}^\mathrm{T} \mathbf{X})^{-1} \ \mathbf{x}_0^\mathrm{T} + \frac{s_e^2}{g}

Here g is the number of repeated measurements of x_0 for which the experimental response is calculated. This results in

Equation 9.92: y_i= \hat{y}_i ± t_{\text{crit}} \cdot s_{y_i}

				
					s_y  = sqrt(MSE_model + x0*VC_matrix*x0');
CI_y = [y_hat0-t_crit*s_y y_hat0+t_crit*s_y];
				
			

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					s_y         = (MSE_model .+ x0*VC_matrix*x0').^2
CI_y        = [y_hat0-t_crit*s_y y_hat0+t_crit*s_y]
				
			
EXERCISE 3: CONFIDENCE INTERVALS

Predict the 95% confidence interval of the value \hat{y} (i.e. \ln{\hat{k}}) of 1.82 at x_0 (i.e. \varphi_0) of 0.28 that we predicted in the previous exercise using Equation 9.90. Round to two decimals. Note that the question is not asking for the confidence limits, but the half-width confidence interval (i.e. 1.82 ± 0.11; the answer would then be 0.11).

Now, calculate the 95% confidence interval for the experimental response around the same value. Round to two decimals. Hint: You will need Equation 9.92 now.

4. Calibration #

4.1. Inverting the model #

Now, that we have learned about least-squares regression and the different forms of variation , we will leave the retention model for the remainder of this lesson (we will return to it in the next lesson) and focus on calibration.

Calibration curves are ultimately used to convert a measured signal (e.g. signal response, y) into an estimate of the analyte concentration x. To do so, the calibration model must be inverted. For the straight-line model (Equation 9.71), the estimated concentration of a new sample becomes:

Equation 9.93: \hat{x}_{\text{sample}} = \frac{\bar{y}_{\text{sample}}-b_0}{b_1}

where \bar{y}_{\text{sample}} is the mean of the g replicate measurements of the sample.

4.2. Confidence interval for sample concentration #

Just like predictions in y, the estimate of x also carries uncertainty. This uncertainty comes from both variability in the measured signal, and uncertainty in the fitted calibration parameters. The standard error in the estimated concentration is given by:

Equation 9.94: s_{\hat{x}_{\text{sample}}}=\frac{s_e}{b_1}\sqrt{\frac{1}{g}+\frac{1}{n}+\frac{(\bar{y}_{\text{sample}} -\bar{y})^2}{b_1^2 \sum_{i=1}^{n} (x_i – \bar{x})^2}}

This expression shows how uncertainty increases when the sample signal lies far from the centre of the calibration data. The confidence interval of the concentration estimate is then:

Equation 9.95: x=\hat{x}_{\text{sample}}\;\pm\;t_{\alpha/2,\,(n-2)} \, s_{\hat{x}_{\text{sample}}}

This interval reflects the precision of the determined concentration. It depends strongly on the slope b_1, because shallow calibration curves (small |b_1|) yield much larger uncertainty in x.

This allows us to calculate the confidence interval for the predicted value \hat{y}_i through

Equation 9.90: y_i= \hat{y}_i ± t_{\text{crit}} \cdot s_{\hat{y}_i}

				
					% Predicted concentration from inverse calibration
x_hat = (y0 - b0) / b1;

% Standard error of predicted concentration (sx)
s_x = (s_y / abs(b1)) * sqrt(1 + 1/n + ((y0 - mean_y)^2) / (b1^2 * Sxx));

% Confidence interval for x
CI_x = [x_hat - t_crit*s_x , x_hat + t_crit*s_x];
				
			

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					s_y_hat     = (x0 .* VC_matrix*x0').^2
CI_y_hat    = [y_hat0.-t_crit.*s_y_hat y_hat0.+t_crit*s_y_hat]
				
			

Concluding remarks #

In this lesson we examined how uncertainty arises in calibration models and how it influences the accuracy of quantitative analytical results. A calibration curve is not simply a deterministic relationship between signal and concentration; it is a statistical model whose parameters are estimated from experimental data. Consequently, the predicted concentration of an unknown sample carries uncertainty originating from both measurement noise and the uncertainty in the fitted calibration model.

We have seen that the variance of predicted concentrations depends on several factors, including the scatter of the calibration points, the number of calibration measurements, and the position of the unknown sample relative to the calibration range. Predictions close to the center of the calibration range typically have lower uncertainty than those near the boundaries.

Understanding calibration-model variance is therefore essential for reliable quantitative analysis. It allows analytical chemists to estimate confidence intervals for predicted concentrations, evaluate the reliability of measurements, and design calibration experiments that minimize uncertainty. These concepts form the basis for more advanced topics such as model validation and error propagation, which will be addressed in the following lessons

Is this article useful?