13. Model Validation

Modelling and calibration are only useful if the model reliably predicts new measurements. In this lesson, we extend the concepts from the previous lesson, least-squares regression and calibration-model variance, to evaluate how well a calibration model performs when applied to new data. We will now explore the key ideas behind model validation, including residual analysis, lack-of-fit testing, and the assumptions underlying linear regression models. Building on our understanding of predicted and experimental confidence intervals, we now learn how to judge whether a model is appropriate for the analytical question at hand. This forms the bridge between constructing a model and trusting it setting the stage for more advanced topics in method performance and analytical quality.

Learning Goals #

Understand the purpose of model validation and why regression models must be validated before being used for quantitative analysis
Use residual analysis as a diagnostic tool by interpreting residual plots
Assess the goodness-of-fit and overal model performance
Connect model validation to confidence intervals

Analytical Separation Science by B.W.J. Pirok and P.J. Schoenmakers

READ SECTIONS 9.6.4-9.6.4.1

VALIDATION & INSPECTION OF RESIDUALS

1. Inspecting the data #

To investigate whether our model is actually a good model, we need to regard the two factors that affect this: the data, and the model itself (i.e. the mathematical equation used for regression). We’ll start with the data.

Case: Retention Model

For this lesson we return to the retention model that we have been fitting and investigating in Lesson 11 and Lesson 12.

1.1. Residuals #

A straightforward way to assess how well a model describes the data is by examining the residuals, defined as $e_i=y_i−\hat{y}_i$ (see also Lesson 11).

This is illustrated in Figure 1, where panel A shows our retention model from the previous lessons and panel B the residuals.

Figure 1. Illustration of a fitted regression model (A) and the corresponding residuals (B). Residual plots provide a simple visual check of model quality: random scatter around zero suggests an adequate fit, while patterns or unusually large points signal possible problems in the model or data.

From our existing code that we have developed over the last few lessons it is easy to plot and investigate the residuals.

				
					plot(x,y-y_hat,'o');

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					plot(x, y-y_hat, label = "o")

Residual plots make it easy to detect systematic deviations that may not be immediately obvious from the fitted curve alone. In Figure 2-A1, corresponding to the straight-line fit of Figure 1A, a clear pattern appears in the residuals, suggesting that the linear model fails to capture the true relationship. In contrast, Figure 2-B1 shows a random scatter of residuals without visible structure, which is characteristic of a well-fitting model.

Figure 2. Residual plot of a (A1) straight-line, and (B1) second-order polynomial fitted to the retention data with three repeats for each

\varphi

instead of one. Panel C1 reflects the same as panel B1, but one datapoint was adjusted to be an outlier (datapoint = 3.3 was changed to 2.2). The dotted line represents a perfect description of the plotted data by the model. Panels A2-C2 reflect the Durbin-Watson test for serial correlation, or autocorrelation.

READ SECTIONS 9.6.4.2-9.6.4.3

AUTOCORRELATION, INFLUENCE & LEVERAGE

1.2. Autocorrelation #

When residuals show a pattern in which neighbouring errors tend to move together, the model may suffer from autocorrelation (also called serial correlation). This means that the error in one datapoint is not independent of the next, which often signals that the model structure is inadequate.

Autocorrelation can be formally assessed using the Durbin–Watson test, which evaluates whether consecutive residuals differ more or less than expected.

Equation 9.96: $\text{DW}_{\text{obs}}=\frac{\Sigma^n_{i=2}(e_i – e_{i-1})^2}{\Sigma^n_{i=1} e^2_i}$

				
					% Outlier Test (Example)
isoutlier(y-y_hat)

% Durban-Watson Test
dwtest(y-y_hat,X_matrix)

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					using StatsBase, HypothesisTests

function isoutlier_iqr(x; k=1.5)
    q1 = quantile(x, 0.25)
    q3 = quantile(x, 0.75)
    iqr = q3 - q1
    lower = q1 - k*iqr
    upper = q3 + k*iqr
    return (x .< lower) .| (x .> upper)
end

function durbin_watson(residuals)
    dw = sum(diff(residuals).^2) / sum(residuals.^2)

    println("Durbin–Watson test")
    println("-------------------")
    println("H0: no first-order autocorrelation")
    println("DW statistic = ", round(dw, digits=4))

    if dw ≈ 2
        println("≈ no autocorrelation")
    elseif dw < 2
        println("positive autocorrelation likely")
    else
        println("negative autocorrelation likely")
    end

    return dw
end

res         = y .- y_hat

# Outlier Test (Example)
outliers    = isoutlier_iqr(res)

# Durban-Watson Test
outliers_dw = durbin_watson(res)

The $\text{DW}_{\text{obs}}$ test statistic ranges from 0 to 4. A value around 2 indicates no autocorrelation, values below 2 suggest positive autocorrelation, and values above 2 indicating negative autocorrelation. As a rule of thumb, values between about 1.5 and 2.5 are acceptable, while values below 1 or above 3 may warrant closer inspection. Most computational implementations also report a corresponding $p$ -value.

EXERCISE 1: OUTLIERS & AUTOCORRELATION

Conduct outlier tests on the residuals for the straight-line retention model that you created in the previous lessons. Do you find any outliers? Be sure to also plot the outliers yourself and try to set your expectations. Do you actually expect there to be an outlier? Run the test with an $\alpha$ of 0.05.

Nice! The residuals are noisy but none are extreme, so outlier tests should return no outliers. Any detected one would likely be a false alarm. Unless you have used an exotic outlier test, there should be no outliers present in these residuals. The huge variation in the residuals is not helping the outlier tests to spot one.

Oops! We did not found an outlier, but perhaps you did? The residuals are widely spread but none stand out, so outlier tests have low statistical power (see Lesson 6) here and should find no true outliers. Any flagged point would probably be a false positive. Therefore, unless you have used an exotic outlier test, there should be no outliers present in these residuals. As there are tons of outlier tests that you could pick, it may very well be that your test did detect an outlier. The key for this exercise is to apply different outlier tests and match their outcomes to your own analysis and expectations.

Now apply the Durban-Watson test to determine whether the same data and model suffer from autocorrelation. Then specify which of the following statements are correct.

Great work! All answers are correct. You correctly identified both the numerical implication of the extremely low Durbin–Watson value (≈0.03) and the visual signs of autocorrelation in the residual plot. This shows a solid understanding of how to interpret both the statistic and the residual patterns.

One or more answers were incorrect. Review how the Durbin–Watson statistic is interpreted: values near 0 indicate strong positive autocorrelation, whereas well-behaving models typically have DW values around 2. Also look carefully at the residual plot, where patterns or series of similar-signed residuals signal autocorrelation. Try again!

Examples of values are shown in Figure 2-A2 through C2. The Durbin–Watson result in Figure 2-A2 ( $\text{DW}_{\text{obs}}$ = 0.66, $p$ < 0.05) confirms the strong autocorrelation already visible in panel A1. For the dataset in panel B, no serial correlation is detected, indicating that the model adequately captures the structure in the data. The test is not reliable in the presence of outliers, so the values shown for panel C2 should be disregarded.

1.3. Influence #

Not all datapoints contribute equally to a regression model. Some points “sit” far from the rest or strongly “pull” the fitted line, meaning that removing them would noticeably change the model.

As unusual points can distort the regression or be mistaken for outliers, it is useful to quantify how much each datapoint affects the model (influence) and how unusual its position is (leverage). These diagnostics help us understand whether an apparent outlier is truly problematic or simply a structurally important point in the dataset.

				
					% Cook's Distance (Example)
mdl = fitlm(x,y);
plotDiagnostics(mdl,'cookd');

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					using GLM, StatsModels, DataFrames, Plots

# Get the intercept
df          = DataFrame(x=x, y=y)
mdl         = lm(@formula(y ~ x), df)

# Cooks Distance
cd          = cooksdistance(mdl)

# Plot the data
scatter(cd, xlabel = "Observation", ylabel = "Cook's Distance", title = "Cook's Distance", legend = false)

# Add a reference line
n           = length(cd)
hline!([4/n])

Cook’s distance measures how much a single datapoint alters the regression when removed. It is a measure of the influence of a datapoint compares the model predictions with and without point $r$ .

Equation 9.97: $\text{CD}_r^2 = \frac{\sum_{i=1}^{n} (\hat{y}_i – \hat{y}_{i,\lnot r})^2}{(m+1)\, s_e^2}$

A large value (e.g. $\text{CD}^2_r > 1$ for small datasets, say $n < 10$ ) indicates that point $r$ is influential and may warrant further inspection. Different thresholds exist, such as $\text{CD}^2_r > n/4$ , but the core idea is simple: a large influence means that the model depends heavily on that datapoint.

Logical NOT Symbol

Equation 9.97 shows $\hat{y}_{i,\lnot r}$ . Although $\lnot$ is normally the logical “NOT” symbol, it is used here simply as a shorthand to mean “without”. We are therefore to interpret $\hat{y}_{i,\lnot r}$ as “the predicted value $\hat{y}$ at point $i$ using the model that was constructed without point $r$ “. This is then compared to the same point but now including point $r$ in $\hat{y}_{i}$ . Note: In many statistics texts, this is written instead as $\hat{y}_{i(-r)}$ .

1.4. Leverage #

Leverage reflects how far a datapoint lies from the other x-values. It is obtained from the diagonal of the hat matrix

Equation 9.98: $\textbf{H}=\textbf{X} \left(\textbf{X}^{\text{T}} \textbf{X}\right)^{-1} \textbf{X}^{\text{T}}$

A point meeting the criterion $h_{r,r}>2(m+1)/n$ is considered high-leverage and has more potential to affect the fitted line, even when its residual is small.

				
					% Cook's Distance (Example)
mdl = fitlm(x,y);
plotDiagnostics(mdl,'cookd');

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					using GLM, StatsModels, DataFrames, Plots

# Get the intercept
df          = DataFrame(x=x, y=y)
mdl         = lm(@formula(y ~ x), df)

# Cooks Distance
cd          = cooksdistance(mdl)

# Plot the data
scatter(cd, xlabel = "Observation", ylabel = "Cook's Distance", title = "Cook's Distance", legend = false)

# Add a reference line
n           = length(cd)
hline!([4/n])

We also saw this in Figure 3B of Lesson 11, where the datapoint highlighted with the arrow in the upper-right corner showed a leverage of 1 (at $n$ = 57 and $m$ = 1; straight-line model). In contrast, the other datapoints has a leverage of 0.03, well below the threshold of 0.07 for that dataset.

EXERCISE 2: INFLUENCE & LEVERAGE

Which of the following statements is or are correct about leverage and influence?

Excellent! High-leverage points can be influential even with small residuals, and Cook’s distance quantifies this effect.

One or more statements were incorrect. Leverage depends on the x-position, not the residual size, and high-leverage points can strongly affect the regression even with small residuals. Low-leverage points rarely have high influence, and Cook’s distance is less effective at detecting them.

READ SECTIONS 9.6.4.4-9.6.4.5

Coefficient of Determination & F-Test of Significance

2. Model Evaluation #

In the first part of this lesson we have addressed means to validate the data used to construct the model. However, it is also useful to evaluate the mathematical equation used for the regression.

2.1. Coefficient of Determination #

A very common and well-known metric for this purpose is the coefficient of determination, or $R^2$ .

Equation 9.100: $R^2 = \frac{SS_{\mathrm{reg}}}{SS_{\mathrm{tot}}} = \frac{\sum_{i=1}^{n} (\hat{y}_i – \bar{y})^2}{\sum_{i=1}^{n} (y_i – \bar{y})^2} = 1 – \frac{SS_{\mathrm{res}}}{SS_{\mathrm{tot}}}$

The $R^2$ quantifies the fraction of the total variation explained by the model by comparing the squared residuals of the model (Figure 3A) versus the total sum of squares of the datapoints compared to the mean (Figure 3B).

Figure 3. Graphical expression of the

\text{SS}_{\text{res}}

and the

\text{SS}_{\text{tot}}

components of the coefficient of determination (

R^2

Linearization

At this stage it is relevant to point out that many retention models are non-linear in their natural form (e.g. $\hat{k}=\exp{(b_0 + b_1 \varphi)}$ . By taking the natural logarithm, we obtain a linearized model for $\ln{\hat{k}}$ , which allows us to use ordinary least-squares regression. However, a key implication is that the fitted model minimizes $SS_{\text{res}}=\Sigma^n_{i=1}(\ln{k_i}-\ln{\hat{k_i}})^2$ rather than the residuals in $k$ -space. In other words, the optimization is performed on the transformed variable, not on the original retention factors. For a discussion of why this matters and possible alternatives, see the textbook Section 9.10.2. For now, the concepts treated in this lesson remain valid.

2.2. Adjusted coefficient of determination #

A limitation of $R^2$ is that it always increases when more parameters are added to the model, even if they do not meaningfully improve the fit. The adjusted coefficient of determination, $R^2_{\text{a}}$ , compensates for this by incorporating the degrees of freedom, effectively penalizing unnecessary parameters and providing a more reliable measure of explained variance.

Equation 9.101: $R_a^2 = 1 -\frac{SS_{\mathrm{res}} / \left(n – (m+1)\right)}{SS_{\mathrm{tot}} / (n-1)}$

Figure 4. Comparison of polynomial models of increasing complexity fitted to the same dataset. Although

R^2

increases as additional terms are added (A to D), this does not necessarily mean the model improves. The adjusted

R^2

provides a more balanced assessment by penalizing unnecessary parameters. Panels C and D illustrate how extra polynomial terms may yield only minimal improvement, highlighting the risk of overfitting when model complexity grows without meaningful gain in predictive power.

Figure 4 compares how the coefficient of determination keeps on increasing as more parameters (i.e. degrees of freedom) are added. This is also true for the adjusted coefficient of determination, but much less so.

				
					SSE_regression = sum((y_hat-mean(y)).^2);
SSE_residuals  = sum((y_hat-y).^2);
SSE_total      = sum((y-mean(y)).^2);
R2             = SSE_regression/SSE_total;

MSE_residuals = SSE_residuals / (size(X_matrix,1) - size(X_matrix,2));
MSE_total      = SSE_total/(size(x,1)-1);
R2_a           = 1-MSE_residuals/MSE_total;

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					using Statistics

SSE_regression  = sum((y_hat .- mean(y)).^2)
SSE_residuals   = sum((y_hat .- y).^2)
SSE_total       = sum((y .- mean(y)).^2)

MSE_residuals   = SSE_residuals / (size(X_matrix,1) - size(X_matrix,2));

R2          = SSE_regression / SSE_total

MSE_total   = SSE_total / (length(x) - 1)

R2_a        = 1 - MSE_residuals / MSE_total

Model validation is essential because it checks whether a fitted model truly generalizes beyond the data used to build it. A model may appear to fit the training data extremely well, yet fail to predict new data accurately. This problem, known as overfitting, occurs when the model becomes too complex and starts capturing noise rather than the underlying trend.

Without proper validation, an overfitted model can give a false sense of accuracy, perform poorly on future measurements, and lead to incorrect analytical decisions. Validation helps ensure that the model is reliable, robust, and suitable for real-world use.

2.3. F-test of significance #

A more sensitive way to evaluate whether a regression model should include an extra term is the $F$ -test of significance, which uses the same concepts as treated in Lesson 7. In our example, the quadratic model fits the retention data better than the straight line because of the added $b_2 x^2$ term. The $F$ -test checks whether this extra term is statistically meaningful, with $H_{text{0}}$ stating that the added term is not significant and $H_{text{1}}$ that it is.

The statistic is calculated as

Equation 9.102: $F_{\text{obs}} =\frac{\text{MS}_{\text{difference}}}{\text{MS}_{\text{res, full}}}=\frac{\left(\text{SS}_{\text{res, reduced}} – \text{SS}_{\text{res, full}} \right)/ \left( q_{\text{full}} – q_{\text{reduced}} \right)}{\text{SS}_{\text{res, full}}/ \left( n – q_{\text{full}} \right)}$

Here, $q$ refers to the total number of parameters in the $\text{full}$ (i.e. including the extra parameter) or $\text{reduced}$ (i.e. without the extra parameter) model, respectively. Note that $q=m+1$ .

Figure 5. Comparison of first-order (A) and second-order (B) polynomial retention models fitted to the same

\ln{k}

versus

\varphi

data. The straight-line model in panel A captures the general downward trend but shows systematic deviations, as indicated by the confidence bands widening toward higher

\varphi

. The quadratic model in panel B more accurately follows the curvature in the data and yields narrower, more symmetric confidence bands, illustrating the improvement gained by adding a second-order term.

				
					% Two Example Models
X_mat_red  = ones(length(x),1);       % REDUCED: y=b0
X_mat_full = [ones(length(x),1) x];   % FULL:    y=b0+b1x

% Regression
b_red      = pinv(X_mat_red'*X_mat_red)*X_mat_red'*y;
b_full     = pinv(X_mat_full'*X_mat_full)*X_mat_full'*y;
y_hat_red  = X_mat_red * b_red;
y_hat_full = X_mat_full * b_full;

% F-Test Of Significance
DoF_red    = size(X_mat_red,1) - size(X_mat_red,2); 
DoF_full   = size(X_mat_full,1) - size(X_mat_full,2); 
MS_diff    = sum((y_hat_full - y_hat_red).^2)./ ...
                (DoF_red - DoF_full); 
MS_full    = sum((y - y_hat_full).^2)/DoF_full;
F_obs      = MS_diff / MS_full;
p          = 1 - cdf('F',F,DoF_red-DoF_full,DoF_full);

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					using LinearAlgebra
using Statistics
using Distributions

# Two Example Models
X_mat_red   = ones(length(x), 1)
X_mat_full  = hcat(ones(length(x)), x)

# Regression (same formula you used)
b_red       = pinv(X_mat_red' * X_mat_red) * X_mat_red' * y
b_full      = pinv(X_mat_full' * X_mat_full) * X_mat_full' * y

y_hat_red   = X_mat_red * b_red
y_hat_full  = X_mat_full * b_full

# Degrees of freedom
DoF_red     = size(X_mat_red,1)  - size(X_mat_red,2)
DoF_full    = size(X_mat_full,1) - size(X_mat_full,2)

# Mean squares
MS_diff     = sum((y_hat_full .- y_hat_red).^2) / (DoF_red - DoF_full)

MS_full     = sum((y .- y_hat_full).^2) / DoF_full

F_obs       = MS_diff / MS_full

# F-test p-value
p           = 1 - cdf(FDist(DoF_red - DoF_full, DoF_full), F_obs)

$\text{MS}_{\text{difference}}$ represents the mean square difference between the reduced and full models i.e. how much the residual sum of squares changes when the models are compared.

Equation 9.103: $\text{MS}_{\text{difference}} =\frac{\sum_{i=1}^{n} (y_i – \hat{y}_{i,\text{reduced}})^2-\sum_{i=1}^{n} (y_i -\hat{y}_{i,\text{full}})^2}{q_{\text{full}} – q_{\text{reduced}}}$

Equation 9.103 quantifies the improvement (or lack thereof) gained by moving from the simpler to the more complex model. See Lesson 8 to read again about the mean squares.

EXERCISE 3: MODEL EVALUATION

Use the F-test of significance and the (adjusted) coefficient of determination to compare a constant ( $\hat{y}=b_0$ ), and a straight line ( $\hat{y}=b_0+b_1 x$ ) as models. Use your results to decide which of these answers are correct.

Great work! Your answers correctly reflect the numerical outcome: the constant model explains none of the variability, while the linear model achieves a high $R^2$ and a large, highly significant $F$ -statistic. You correctly identified that adding the slope term strongly improves the model.

One or more statements were incorrect. Recall that the constant model gives a $R^2$ of 0 and very large residuals, while the linear model yields a high $R^2$ and a highly significant $F$ -test result. Review the fitted values and residuals to see how strongly the straight-line model outperforms the constant model. Note that some statements contradict each other. Your numerical answers should be around: $R^2_\text{reduced}$ =0.0000, $R^2_\text{full}$ =0.9217, $R^2_\text{a,full}$ =0.9086, $F$ =70.58, $p$ =1.55×10⁻⁴.

Repeat the calculations but now to compare the straight-line model ( $\hat{y}=b_0+b_1 x$ ) with a second-order (quadratic) polynomial ( $\hat{y}=b_0+b_1 x+b_2 x^2$ ). Which of the following statements are correct?

Excellent!The quadratic model indeed provides a statistically significant improvement over the linear model. Both the increase in $R^2$ and the very small $p$ -value confirm that the curvature in the data is real and important.

Some statements were not correct. Take a close look at the residuals and the curvature in the fitted lines. Your numbers should be $R^2_\text{full}$ =0.99676, $R^2_\text{a,full}$ =0.99546, $F$ =115.83, $p$ =1.20×10⁻⁴.

Finally, repeat the calculations once more to compare a second-order (quadratic; $\hat{y}=b_0+b_1 x+b_2 x^2$ ) and third-order (cubic) polynomial ( $\hat{y}=b_0+b_1 x+b_2 x^2 + b_3 x^3$ ). Which of the statements is correct?

Correct! The cubic model does not provide meaningful improvement. The tiny change in $R^2$ , the non-significant $F$ -test, and the unchanged adjusted $R^2_{\text{a}}$ all point toward overfitting when adding the third-order term.

Not quite. Try again! Your numbers should be near: $R^2_\text{full}$ =0.99677, $R^2_\text{a,full}$ =0.99435, $F$ =0.0180, $p$ =0.900.

3. Figures of Merit #

Once a calibration model has been established, we can extract several key performance characteristics, known as analytical figures of merit, that describe the quality of an analytical method.

3.1. Sensitivity #

A familiar one is sensitivity, defined as the slope $b_1$ of a straight-line calibration curve. Sensitivity quantifies how strongly the instrument response changes with concentration, and is unrelated to the statistical sensitivity ( $1-\beta$ ; see Lesson 6).

3.2. Detection Limits #

In regulated fields, however, the most critical figures of merit are the decision limit, detection limit, and quantification limit, all of which describe how low an analyte concentration can be reliably detected or quantified.

To define these detection-related limits, we start with the blank, a measurement identical to the sample matrix but without analyte. The blank shows a baseline signal $\mu_{\text{blank}}$ and a noise level $\sigma_{\text{blank}}$ , which after enough repeats follows a normal distribution.

The decision limit $y_{\text{C}}$ (also called $L_{\text{crit}}$ or $CC_\alpha$ ) is the lowest signal at which we conclude that analyte is present, with a false-positive rate $\alpha$ . It is defined as

Equation 9.105: $y_\text{C} = \mu_\text{blank} + k_{\text{C}} \sigma_\text{blank}$

typically using $k_{\text{C}}$ , corresponding to $\alpha$ . Because $\mu_{\text{b}}$ equals the intercept $b_0$ of the calibration curve, the corresponding concentration can be expressed as $x_{\text{C}}$ =(k_{\text{C}} \cdot \sigma_{\text{bl}})/b_1[/latex] or by a regression-based expression that incorporates the variability in the predicted signal. In terms of concentration, this limit can be written as

Equation 9.106: $x_{\text{C}} \approx \frac{t_{\alpha,n-2}\, s_e}{b_1} \sqrt{ \frac{1}{g} + \frac{1}{n} +\frac{\bar{x}^{2}}{\sum_{i=1}^n (x_i – \bar{x})^2}}$

Figure 6. Graphical explanation of the decision limit (

y_\text{C}

), detection limit (

y_\text{D}

), and their corresponding concentration values

x_\text{C}

and

x_\text{D}

. Panel D illustrates the blank signal with baseline

\mu_\text{bl}

and noise

\sigma_\text{bl}

. Panels B and C show how the distributions shift when analyte is present: at the decision limit, the false-positive rate

\alpha

is controlled but the false-negative rate

\beta

remains high, while the detection limit ensures both

\alpha

and

\beta

are small. Panel A maps these limits from signal space onto the calibration curve, demonstrating how sensitivity (slope

b_1

) governs the achievable detection and quantification limits.

3.3. Detection limit #

While the decision limit controls false positives, it still yields a 50% chance of a false negative ( $\beta$ ). To address this, the detection limit $y_{\text{D}}$ is defined as

Equation 9.107: $y_{\text{D}} = \mu_{\text{bl}} + (k_{\text{C}} + k_{\text{D}})\sigma_{\text{bl}}$

with $k_{\text{D}}$ = 3, giving a total multiplier of $6\sigma_\text{bl}$ . At this level, both $\alpha$ and $\beta$ are ≈ 0.0013, ensuring a statistical power of 0.9987. The corresponding concentration-based detection limit is obtained analogously to the decision limit, but evaluated at the higher signal level. This defines the minimum concentration that can be detected with high confidence though not necessarily quantified precisely. The limit can also be defined in terms of concentration

Equation 9.108: $x_{\text{D}} \approx \frac{t_{\beta,n-2}\, s_e}{b_1}\sqrt{\frac{1}{g} + \frac{1}{n} + \frac{(2x_{\text{C}} – \bar{x})^{2}}{\sum_{i=1}^n (x_i – \bar{x})^2}}$

3.4. Quantification limit #

The quantification limit represents the lowest concentration that can be measured with acceptable precision. It is defined as

Equation 9.109: $y_{\text{Q}} = \mu_\text{bl} + 10\sigma_\text{bl}$

leading to

Equation 9.110: $x_{\text{Q}} = 10 \frac{\sigma_\text{bl}}{b_1}$

The multiplier of 10 is chosen because the relative precision at the detection limit is typically around 16%, which is considered too poor for quantitative reporting. The quantification limit instead corresponds to a relative precision of roughly 10%, making it the practical lower boundary for reliable quantification.

Concluding remarks #

In this lesson we explored the essential tools used to judge the quality of a regression model: examination of residuals, detection of patterns such as autocorrelation, assessment of leverage and influence, evaluation of model complexity through $R^2$ and $R^2_{\text{a}}$ , and the use of statistical tests such as the F-test of significance. Together, these techniques form the core of model validation, the process of ensuring that a chosen mathematical description truly reflects the underlying analytical relationship.

While our focus here remained on ordinary least-squares (OLS) regression using straight-line and polynomial models, it is important to recognize that OLS is only one member of a much larger family of regression approaches. Many analytical situations require models that go beyond the assumptions of constant variance, linearity, or equal weighting of data points.

In future lessons, we will extend these ideas to more advanced regression strategies, including:

Weighted regression, used when measurement errors vary across the calibration range.
Non-linear regression, needed when relationships cannot be linearized without distorting the error structure.
Iterative optimization-based methods, which refine model parameters when no closed-form solutions exist.
Multivariate regression techniques, essential when multiple predictors (e.g., spectral intensities, chromatographic features) jointly determine the response.

These approaches build upon the principles introduced here, residual analysis, influence diagnostics, and statistical hypothesis testing, while offering greater flexibility and robustness for complex analytical problems.

Ultimately, mastering model validation allows you not only to fit models, but to trust them, ensuring that your calibration and quantitative measurements rest upon solid statistical foundations.

Extensive Exercise #

EXTENSIVE EXERCISE: VAN DEEMTER

This is an exam-grade question in the MSc. Chemometrics & Statistics course at the University of Amsterdam. It is worth of 20 out of 100 pts, and should be completed within 35 minutes.

Chromatographic band broadening can be modelled as a function of the flow rate according to the reduced van Deemter equation:

Equation 1.81: $h=a+\frac{b}{\nu}+c\cdot\nu$

where $a$ , $b$ and $c$ are constants, $\nu$ is the reduced flow velocity and $h$ is the reduced plate height (both $\nu$ and $h$ are dimensionless). Experiments were carried out in a chromatographic system. Plate heights were measured at different reduced flow velocities. The following data was observed:

$\nu$	$h$
0.5	3.60
1.0	2.57
1.5	2.48
2.5	2.16
3.0	2.18
5.0	2.37
7.5	2.46
10.0	3.24

				
					x=[0.5, 1.0, 1.5, 2.5, 3.0, 5.0, 7.5, 10.0];
y=[3.60, 2.57, 2.48, 2.16, 2.18, 2.37, 2.46, 3.24];

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					#=
Julia does not have a ANOVA funcion,therefore the following
    function was created. 
Copy this function, after running this function you can use it
    as any other regular function.
=#


##############################################
using LinearAlgebra, Distributions

"""
    ANOVA 1 analysis of variances.
    If `group` is not specified, setup groups based on columns of `x`. Otherwise, setup groups based on `group`.
    The input variables can contain missing values for `x`, which will be removed before ANOVA analysis.\n
    Parameters
    ----------
    - x : AbstractMatrix{T<:Real}
        A matrix with columns corresponding to the individual groups.\n
    - group : AbstractVector{T<:Real}, optional
        An equally sized vector as `x`.\n
    Returns
    -------
    Dict{Any,Any}
        A dictionary containing the following keys:\n
        - `"DF"` : A tuple of vectors corresponding to degrees of freedom for between groups, residual, and total.
        - `"SS"` : A tuple of vectors corresponding to the sum of squares for between groups, residual, and total.
        - `"MS"` : A tuple of vectors corresponding to the mean square for between groups and residual.
        - `"F"` : The F-statistic for the ANOVA analysis.
        - `"p-value"` : The p-value for the ANOVA analysis.
"""
function anova1(x,group = [])
    # ANOVA 1 analysis of variances
    # anova1(x), where x is a matrix with columns correpsonding to the induvidual groups
    # anova1(x,group), where x is an equally sized vector as group
    # the input variables can contains missing values for x, which will be removed before anova analysis

    if isempty(group)
        # setup groups based on x columns
        group = ones(size(x)) .* collect(1:size(x,2))'
        group = reshape(group,:,1)
        x = reshape(x,:,1)
    #setup groups based on x columns
    elseif length(x) .!= length(group)
        println("x and groups contain a different amount of elements")
        return
    else
        if size(group, 1) == 1
            group = group'
        end
        if size(x, 1) == 1
            x = x'
        end
    end

    #remove NaN values
    if any(isnan.(x))
        group = group[isnan.(x).==0]
        x = x[isnan.(x).==0]
    end

    x_ori = x
    x_mc = x .- mean(x)
    gr_n = unique(group)
    gr_m = ones(size(gr_n))
    gr_c = ones(size(gr_n))
    for i = 1:length(gr_n)
        gr_m[i] = mean(x_mc[group.== gr_n[i]])
        gr_c[i] = sum(group.==gr_n[i])
    end

    x_mean_mc = mean(x_mc)
    x_cent = gr_m .- x_mean_mc
    #degees of freedom
    df1 = length(gr_c) - 1
    df2 = length(x) - df1 - 1

    RSS = dot(gr_c, x_cent.^2)

    TSS = (x_mc .- x_mean_mc)'*(x_mc .- x_mean_mc)

    SSE = TSS[1] - RSS[1]
    if df2 > 0
        mse = SSE/df2
    else
        mse = NaN
    end

    if SSE !=0
        F = (RSS/df1) / mse
        p = 1-cdf(FDist(df1,df2),F)
    elseif RSS==0
        F = NaN;
        p = NaN;
    else
        F = Inf;
        p = 0;
    end

    #print results
    sum_df1 = df1+df2
    MS1 = RSS/df1
    println("")
    println("anova1 results")
    println("----------------------------------------------------------")
    println("Source\t\tDF\tSS\t\t\tMS\t\t\tF\t\t\tp")
    println("Between\t\t$df1\t$RSS\t$MS1\t$F\t$p     ")
    println("Residual\t$df2\t$SSE\t$mse                     ")
    println("Total\t\t$sum_df1\t$TSS                               ")

    # stats = DataFrame(Source = ["Between", "Residual", "Total"], DF = [df1, df2, sum_df1],
    #                   SS = [RSS, SSE, TSS], DF = [df1, df2, sum_df1], DF = [df1, df2, sum_df1], DF = [df1, df2, sum_df1])

    stats = Dict("DF" => (["Between","Residual", "Total"],[df1, df2, sum_df1]),
                "SS" => (["RSS", "SSE", "TSS"],[RSS, SSE, TSS]),
                "MS" => (["Between","Residual"],[MS1, mse]),
                "F" => F, "p-value" => p)

    return stats
end



##############################################
# Example Data
x = [3.18 3.47 3.34;
     3.18 3.44 3.06;
     2.96 3.41 3.02;
     3.13 3.58 3.04;
     2.96 3.63 2.83;
     3.01 3.70 3.06]

# Perform ANOVA
anova1(x)

Fit the reduced van Deemter equation to the data. Consider that all the error is in the measurement of the plate height. Calculate the fitted a, b, and c parameters. Round to two decimals. (5pts)

Correct!

Incorrect. Check your calculation. Note that the most important part is that you need to properly define the X-matrix in such a way that the Van Deemter equation is represented by it.

Calculate the 95% confidence limits of $a$ , $b$ and $c$ . (4pts)

Yes!

No, this is unfortunately incorrect.

Yep.

No, try again!

According to the theory, the reduced linear velocity has an optimal reduced linear velocity at $\nu_{\text{min}}=\sqrt{b/c}$ . Calculate the optimal reduced linear velocity. Round to two decimals. (1pt)

Very good.

Try again!

What is the expected reduced plate height at this velocity? (2pts) Round to two decimals.

Very good. This is close to the theoretical optimum as is explained in Chapter 1.

Try again!

What are the 95% confidence limits of this reduced plate height? (3pts)

Excellent! Prof. Gert Desmet would be proud!

Oops! Try once more. Prof. Gert Desmet does not mind the mistake 🙂

Your colleague is afraid that you may be dealing with autocorrelation. Calculate the Durban-Watson statistic. Is there autocorrelation? (3pts)

Correct!

Try again!

Your colleague is not done criticizing your work yet and comments on the spread in the data, suggesting you are likely to have outliers. Conduct an outlier test and plot the residuals. Specify which of the following statements is/are true. (2pts)

Correct!

Try again!

INFORMATION REPOSITORY

Extra

MSc. Chemometrics & Statistics

MSc. Separation Science

13. Model Validation

Learning Goals #

READ SECTIONS 9.6.4-9.6.4.1

1. Inspecting the data #

Case: Retention Model

1.1. Residuals #

READ SECTIONS 9.6.4.2-9.6.4.3

1.2. Autocorrelation #

EXERCISE 1: OUTLIERS & AUTOCORRELATION

1.3. Influence #

Logical NOT Symbol

1.4. Leverage #

EXERCISE 2: INFLUENCE & LEVERAGE

READ SECTIONS 9.6.4.4-9.6.4.5

2. Model Evaluation #

2.1. Coefficient of Determination #

Linearization

2.2. Adjusted coefficient of determination #

2.3. F-test of significance #

EXERCISE 3: MODEL EVALUATION

3. Figures of Merit #

3.1. Sensitivity #

3.2. Detection Limits #

3.3. Detection limit #

3.4. Quantification limit #

Concluding remarks #

Extensive Exercise #

EXTENSIVE EXERCISE: VAN DEEMTER

Is this article useful?

INFORMATION REPOSITORY

13. Model Validation

Learning Goals #

READ SECTIONS 9.6.4-9.6.4.1

1. Inspecting the data #

Case: Retention Model

1.1. Residuals #

READ SECTIONS 9.6.4.2-9.6.4.3

1.2. Autocorrelation #

EXERCISE 1: OUTLIERS & AUTOCORRELATION

1.3. Influence #

Logical NOT Symbol

1.4. Leverage #

EXERCISE 2: INFLUENCE & LEVERAGE

READ SECTIONS 9.6.4.4-9.6.4.5

2. Model Evaluation #

2.1. Coefficient of Determination #

Linearization

2.2. Adjusted coefficient of determination #

2.3. F-test of significance #

EXERCISE 3: MODEL EVALUATION

3. Figures of Merit #

3.1. Sensitivity #

3.2. Detection Limits #

3.3. Detection limit #

3.4. Quantification limit #

Concluding remarks #

Extensive Exercise #

EXTENSIVE EXERCISE: VAN DEEMTER

Is this article useful?

Share This Article :