Learning Goals #
- Evaluate whether data follows a normal distribution.
- Study whether there is evidence to support that datapoints are outliers.
- Test whether the variances of two samples are homogenous.
READ SECTION 9.4.1
Normality testing
1. Normality testing #
We start our data evaluation with the test for normality. Indeed, most parametric statistical tests distinguish themselves by requiring the data to follow a specific distribution. For the tests covered in this course this largely concerns the normal distribution. In other words, we want to study whether our data follows the normal distribution. This is also sometimes referred to as the normality criterion.
1.1. Graphical method #
Of course, one method that we can attempt to exploit is leveraging our plotting skills. Figure 1A and 1B show the box-and-whisker plot and histogram, respectively, as we have been taught in Lesson 2. The box-and-whisker plot suggests the data is symmetrically distributed, but this is contradicted by the histogram.
Figure 1. Three different methods of plotting the same data that give different perspectives as to the data meeting the normality criterion.
SEE BOOK FOR A DETAILED GUIDE
Section 9.4.1.1 features a detailed step by step example of creating a P-P plot, as well as a Q-Q plot.
In essence, the actual values are converted to z-values, through the relation z=(x-\mu)/\sigma and the CDF is computed for each. These are then plotted against the CDF of the points were they to ideally follow the normal distributions. If the actual data is normally distributed, they will appear to follow the straight dotted line.
1.2. Statistical tests #
It is also possible to express the normality criterion with a number. A well known example is the Kolmogorov-Smirnov test. This test compares the actual CDF and the theoretically-ideal CDF with H_0 reflecting that the data is normally distributed, and H_1 that it is not. The test actually examines only the largest difference available within the dataset. This is shown in Figure 9.26 in the book.
The Kolmogorov-Smirnov test is actually appropriate for a large sample size (n > 50). For smaller sample sizes we tend to use the Lilliefors’ correction [1]. This is essentially the same test, but with stricter critical values.
Both the Kolmogorov-Smirnov and Lilliefors’ tests can be executed using the following function where h is the test result, p the p-value and x the data vector. The public MATLAB File Exchange features several functions created by others that can conduct the Shapiro-Wilk test.
% Kolmogorov-Smirnov Test
[h,p] = kstest(x)
% Lilliefors' Test
[h,p] = lillietest(x)
Sorry, our website does not feature a Lilliefors’ test currently.
##### Kolmogorov-Smirnov Test
using HypothesisTests
# Define the data
x = [3.18, 3.47, 3.34, 3.18, 3.44, 3.06, 2.96, 3.41, 3.02,
3.13, 3.58, 3.04, 2.96, 3.63, 2.83, 3.01, 3.70, 3.06]
# Perform Kolmogorov-Smirnov Test
ks_test = ExactOneSampleKSTest(x, Normal())
# Extract results
h = pvalue(ks_test) < 0.05 # Reject null hypothesis if p < 0.05
p = pvalue(ks_test)
# Print results
println("Kolmogorov-Smirnov Test:")
println("Hypothesis rejected (data not normal): ", h)
println("p-value: ", p)
##### Lilliefors' Test
# The Lilliefoers' Test is not featured on the website currently
READ SECTION 9.4.2
Outlier testing
2. Outlier testing #
The Kolmogorov-Smirnov test shows us that outliers can be a serious problem. We will now study some methods to determine whether a datapoint is an outlier, but must be mindful of the fact that – like any statistical test – most methods are likely to make mistakes. To make things worse, we will see that some tests can contradict each other.
WARNING
The removal of outliers is a very sensitive topic that must be approached with great care. Removing a datapoint is arguably approaching subjectivity and essentially equals manipulation of the dataset. Regulated environments therefore often do not allow outliers to be removed.
2.1. Critical range #
A relatively simple test is the critical range method that can be used if \sigma is known. This test assess H_0 that the data contains no outliers, against H_1 that the data contains outliers. H_0 is rejected if
Equation 9.56: (x_{\text{max}}-x_{\text{min}})>CR_{\alpha,n}\sigma
Here, x_{\text{min}} and x_{\text{max}} are the minimum and maximum values of the dataset. CR_{\alpha,n} is a tabulated (see Table 1) critical value that depends on the significance level and the sample size.
| n | CR_{0.1,n} | CR_{0.05,n} | CR_{0.025,n} |
|---|---|---|---|
| 2 | 2.3 | 2.8 | 3.2 |
| 3 | 2.9 | 3.3 | 3.7 |
| 4 | 3.2 | 3.6 | 4 |
| 5 | 3.5 | 3.9 | 4.2 |
| 6 | 3.7 | 4 | 4.4 |
| 7 | 3.8 | 4.2 | 4.5 |
| 8 | 3.9 | 4.3 | 4.6 |
| 9 | 4 | 4.4 | 4.7 |
| 10 | 4.1 | 4.5 | 4.8 |
| 11 | 4.2 | 4.5 | 4.9 |
| 12 | 4.3 | 4.6 | 4.9 |
| 13 | 4.3 | 4.7 | 5 |
| 14 | 4.4 | 4.8 | 5.1 |
The following repeated measurements were obtained x = [23.7 23.5 22.4 22.9 26.5 23.9 23.0 22.1];. Is there an outlier if the precision of the method is 0.5? Run the test at a significance level of 0.05.
Correct! Remember that any statistical test cannot tell you whether the null hypothesis is true or false. It can only tell you whether there is sufficient evidence to reject the null hypothesis!
This is not correct. Try again! Did you take the correct value from the table (4.3)?
2.2. Dixon Q and Grubbs tests #
If \sigma is not known an alternative option is the Dixon Q test. The hypotheses are the same as with the critical range method, but the Q statistic is calculated different depending (i) the sample size, and (ii) whether an outlier is suspected at the lower or high end of the sorted data.
EQUATIONS TO CALCULATE Q STATISTIC
See the book for the equations to calculate the Q statistics. They can be found from Equation 9.51a – 9.53b.
Similar to the critical range method, the Q statistic is compared to a critical Q value that is tabulated.
The Dixon test is useful, but particularly sensitive to the present of two outliers as is also demonstrated by the example in Figure 3 (see Section 9.4.2.1 of the book for an elaborate calculation).
Dixon test can be executed using the following function where h is the test result, p the p-value, x the data vector, alpha the significance level and tails whether it is a left, right or two-sided test. The function can be downloaded here (Dixon.m, .ZIP). Unpack in the Current Folder window. See the annotated code for a clarification on the use.
[h,p,stats] = Dixon(x,alpha,tail);
Sorry, our website does not feature a Lilliefors’ test currently.
using Statistics, Distributions
include("Dixon.jl")
# Example Usage
x = [3.18, 3.47, 3.34, 3.18, 3.44, 3.06, 2.96, 3.41, 3.02, 3.13, 3.58, 3.04, 2.96, 3.63, 2.83, 3.01, 3.70, 3.06]
# Run the Dixon test
h, stats = Dixon(x,0.05,"both")
An alternative to the Dixon test is the Grubbs test that can be used to detect two outliers. It is described in further detail in Section 9.4.2.2.
Grubbs test can be executed using the following function where h is the test result, p the p-value, x the data vector, alpha the significance level and tails whether it is a left, right or two-sided test. The function can be downloaded here (Grubbs.m, .ZIP). Unpack in the Current Folder window. See the annotated code for a clarification on the use.
The alternative method employed the is outlier function and returns for each value a boolean that indicates whether the value is an outlier (1) or not (0).
[h,p,stats] = Grubbs(x,alpha,tail);
% Alternative Method
TF = isoutlier(x,'grubbs')
Sorry, our website does not feature a Lilliefors’ test currently.
using Statistics, Distributions
include("Grubbs.jl")
# Example Usage
x = [3.18, 3.47, 3.34, 3.18, 3.44, 3.06, 2.96, 3.41, 3.02, 3.13, 3.58, 3.04, 2.96, 3.63, 2.83, 3.01, 3.70, 3.06]
# Run the Dixon test
h,p,stats = Grubbs(x,0.05,"both")
2.3. Median Absolute Deviation (MAD) #
An incredibly useful mathematical outlier threshold tool is the so-called median absolute deviation (MAD) distance.
% Median Absolute Deviation Method
TF = isoutlier(x);
Sorry, our website does not feature a Lilliefors’ test currently.
using Statistics
# Function to compute MAD
function mad(x; threshold=3)
median_x = median(x)
deviations = abs.(x .- median_x)
mad_value = median(deviations)
return deviations .> threshold * mad_value
end
# Example data
x = [1, 2, 3, 4, 100, 6, 7]
# Detect outliers using MAD method
TF = mad(x)
println(TF)
3. Heteroscedasticity #
3.1. Bartlett Test #
Another well known homoscedasticity test is the Bartlett’s test, that allows several variances to be compared. The test statistic, which follows the \chi^2-distribution, is computed as
Equation 9.59: \chi^2_{\text{obs}}=\frac{[\nu_{\text{pool}} \log{s^2_{\text{pool}}}-\sum^k_{i=1} \nu_i \log{s^2_i}]}{C}
Here, k is the number of groups, \nu_{\text{pool}} the total number of degrees of freedom, and s^2_{\text{pool}} the pooled variance given by \sum^k_{i=1} \nu_i s^2_i / \nu_{\text{pool}}. Finally, C is given by
Equation 9.60: C=1+\frac{(\sum^k_{i=1} \frac{1}{\nu_i})-\frac{1}{\nu_{\text{pool}}}}{3(k-1)}
As always, the observed statistic is compared to the critical value, now at k– 1 degrees of freedom. Bartlett’s test is extremely sensitive to deviations of normality. We therefore typically use the robust Levene’s test, which is treated in the next lesson.
Concluding remarks #
We have now learned methods to test important prerequisites of statistical tests, such as the normality criterion, the presence of outliers and homoscedasticity. The latter was actually already covered by the F-test from Lesson 7.
Now that we can test this, the next question becomes what happens when one of these tests fail. This is covered in the next lesson.
References #
[1] Lilliefors, H. W. On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown. Journal of the American Statistical Association, 62(318), 1967, 399–402, DOI: https://doi.org/10.1080/01621459.1967.10482916
[2] Shapiro, S.S., Wilk, M.B., An analysis of variance test for normality (complete samples), Biometrika, Volume 52(3-4), 1965, 591–611, DOI: https://doi.org/10.1093/biomet/52.3-4.591
[3] Razali, N; Wah, Y.B. Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests. J. of Stat. Mod. and Anal. 2011, 2(1), 21–33, LINK: ResearchGate