INFORMATION REPOSITORY

10. Robust Statistics

Updated on February 5, 2025
In the lessons thus far we have worked with statistics that assumed several criteria to be fulfilled: data had to be normally distributed, be free of outliers, and so on. However, what happens when these criteria are not met? In this lesson we will learn about robust alternative methods. We will see that the simplicity of these methods that renders them so robust comes at a cost.

Learning Goals #

  • x
  • y
  • z
Analytical Separation Science by B.W.J. Pirok and P.J. Schoenmakers
READ SECTIONS 9.5.1-9.5.2

Introduction to robust methods

1. What is a robust method? #

Robust or non-parametric methods operate by abstracting the data to some degree to distil an answer. This is best understood by revising the example in Lesson 2

				
					x = [1, 2, 3, 4, 5, 6, 42];
				
			

x = [1; 2; 3; 4; 5; 6; 42]

Here, the mean (9) was strongly affected by the presence of an outlier (42) yet the median (4) was not. The median completely ignores outliers and simply selects the middle value of the sorted vector. For the median, it does not matter how serious the outliers are, it is robust against their presence. Key characteristics of such non-parametric methods are shown in Figure 1.

Figure 1. Comparison between parametric and non-parametric (robust) tests.

Is there then no disadvantage of the median? Well, the median completely ignores all the other actual values themselves. It simply selects the middle value. You could therefore say that it represents the complete dataset less. We will see in this lesson that due to this abstract treatment of the data, robust methods feature significantly less statistical power. Table 1 shows an overview of parametric and non-parametric methods.

Table 1. Overview of parametric and non-parametric tests.
Objective Parametric Non-parametric
Describing a sample
Mean
Standard deviation
Median
IQR
Comparing mean with value
Comparing two means (matched)
t-test
Sign’s test
Wilcoxon’s test
Comparing two means (non-matched)
t-test
Mann-Whitney U test
Comparing variances
F-test (k=2)
Bartlett test (k\ge2)
Levene’s test (k\ge2)
Comparing several means
ANOVA
Kruskal-Wallis
Measuring variables relationship
Pearson correlation
Spearman correlation

2. Comparison of mean with reference #

2.1. Signs test #

In Lesson 4, we learned about the comparison of a sample mean with a reference as our first hypothesis test. The first non-parametric alternative is the sign’s test. 

CASE: RETENTON TIME STABILITY

According to a protocol the median retention time of an analyte should be \tilde{x}_0 = 3.5 min. The repeated measurements shown below are obtained. Is the median of the population of data 3.5? Note that the hypotheses are H_0: \tilde{x}=\tilde{x}_0 and H_1: \tilde{x}\neq\tilde{x}_0.

				
					x = [3.3, 3.2, 3.4, 4.7, 3.7, 3.3, 3.4, 3.2, 3.5];
				
			

The sign’s test exclusively considers the direction of the deviation and literally considers the signs. The procedure is as follows:

  1. Subtract the reference from each value in the dataset.
  2. Count the number of positive and negative values (i.e. literally consider the signs).
  3. Take the minimum of the two values.
  4. Compare r_{\text{obs}} with a tabulated r_{\text{crit}} value.

Considering the case above, what is the outcome of your hypothesis test?

The signs test can also be used for a matched pairs comparison of two sample means, by first subtracting the two datasets from each other prior to following Step 1 above.

The sign’s test reduces the content of the data to the mere signs of the values compared to the reference. It is not surprising that the statistical power is thus lower than that of a parametric t-test.

2.2. Wilcoxon test #

The Wilcoxon test also considers the ranks of the data, rather than just the signs themselves and therewith provides an alternative to the signs test that yields more statistical power. The procedure is:

  1. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.
OBJECTx_{1,i}x_{2,i}SIGNSRANK
1114116-2-1
249427+7.5
3100955+4
4201010+9.5
59094-4-2.5
61061006+5.5
7100964+2.5
895102-7-7.5
916015010+9.5
101101046+5.5
Is this article useful?