04. Hypothesis Testing

We have learned thus far to interact with the data and calculate various descriptive statistics. But how can we use these bits of information to answer our actual analytical questions? This will be the topic of the present lesson. We will see how a general hypothesis test works and learn about the first of several such tests: the comparison of our sample mean with a reference value.

Learning Goals #

Understand how hypothesis tests are a cornerstone of analytical chemistry.
Express an analytical problem into hypotheses.
Conduct a hypothesis test to compare a sample mean with a reference value.
Gain improved understanding about the significance level.
Measures the probability of obtaining the observed results, assuming that the null hypothesis is true, using a p-value.

Analytical Separation Science by B.W.J. Pirok and P.J. Schoenmakers

READ SECTIONS 9.3.1

Introduction to hypothesis testing

1. Hypothesis tests #

In Lesson 2 we calculated that the mean of the 100 RPLC measurements from the table was

\bar{x}

9.126 ppm with a standard deviation of

s

0.0192 ppm. Suppose now that the actual sample was a standard material certified to be

\mu_{0}

9.124 ppm. Can we now conclude that the method worked correctly for our 100 measurements? It sure seems to be the case, but how much can we rely on our feeling? We need numerical information to support our hypothesis that the method works correctly.

1.1. What is hypothesis testing? #

Hypothesis testing is an inferential statistical method that can be used to determine whether there is sufficient evidence in the measured data to support conclusions about the population. Hypothesis testing is a corner stone of analytical chemistry, with applications including instrument calibration, regulatory systems compliance, and policy development.

A hypothesis test yields evidence regarding whether the hypothesis is plausible based on the available data. It does not provide evidence to whether the hypothesis is true. This also means that we can make mistakes using the hypothesis test.

1.2. The analytical question #

Before we explore how a hypothesis test works we must first consider what we are actually after. To ensure that our answer is relevant, we need to distill the analytical question from the actual problem.

Is the factory polluting the river?
Environmental analysis is a classic example of analytical chemistry. The analytical question in this context would be: Is the concentration of pollutant X in the wastewater effluent exceeding the limit?

Is the suspect guilty?
Analysis of DNA, gunshot residue, alcohol levels, fire debris, etc. are each examples of analytical chemistry in forensics. The analytical question could be: does the chemical profile of the sample match the one found at the suspects house?

Is the buffer affecting retention in RPLC?
We must tailor a multitude of parameters when we develop analytical separation methods. How to test whether a parameter actually affects the results?

Is one of the instruments broken?
Hypothesis testing can also be used to aid the analytical chemist. A reference sample can be measured with a certified variable. Is the concentration according to specifications?

In each of these cases we will have to infer information about the population based repeated measurements of a sample of this population. This is an example of inferential statistics.

1.3. How does a hypothesis test work? #

The mechanism of a hypothesis test will be addressed in detail in the next sections. In brief, once an analytical problem has been identified for testing, a hypothesis test roughly works like this:

Identify the statistical method and its prerequisites.
Formulate the hypotheses.
Decide on the significance level.
Calculate the test statistic.
Conduct the statistical test and determine its outcome.

The success of a hypothesis test heavily relies on the quality of the available data. Obviously, an important step preceding the ones above is that the analytical method provides good data!

Type of hypothesis tests

There are a large number of statistical hypothesis tests. In this course we will cover:

Comparison of a mean with a reference value (Lesson 4)
Comparison of two means (Lesson 5)
Comparison of a variance with a reference value (Lesson 7)
Comparison of two variances (Lesson 7)
Comparison of multiple means (Lesson 8)
Normality test
Outlier test

The knowledge from the previous lessons on

z

and [later]t[/latex]-statistics allows us to start with a simple but very useful case: the comparison of a mean with a reference. We will learn about this test and use it to simultaneously better grasp the principles of a general hypothesis test, its strengths and limitations.

READ SECTIONS 9.3.2

Comparison of mean with reference value

2. Comparison of sample mean with reference (t-test) #

For this type of hypothesis test, the objective is to compare our sample of a limited number of repeated measurements to a reference value. To illustrate this, we will regard the identification of a compound in gas chromatography using retention indices. The retention of a reference compound should be 1290. Note that this is now our true value, our $\mu_{0}$ . Six repeated measurements are conducted, yielding

x = [1293, 1291, 1285, 1287, 1291, 1283];

The question now is whether the retention index matches. Let’s walk through the steps.

2.1. Step 1: Test requirements #

Each statistical test relies on certain prerequisites or assumptions. This is similar to what we saw in Lesson 2 where the mean was heavily affected by an outlier and the median was not. For the present case, we are about to do a $z$ or $t$ -test comparison and there several relevant assumptions. The data must be:

Pertaining to continuous (interval) or ordinal variables.
Normally distributed.
Free of outliers.
Independent: a representative but random sample from the population.

To focus on the operation of the hypothesis test we will for now assume that these requirements are met. We’ll learn how to test for them in later lessons.

The $t$ and $z$ comparisons of a mean with a reference value requires that the sample must be “independent: a representative but random sample from the population“. Which of the following statements is in agreement with this?

This is correct! Objects of a sample must always be measured in random order to avoid carryover effects biasing the results (lab bias). Moreover, the contents of a vial in chromatography should be homogenous. This means that the liquid throughout the entire vial has a similar composition so that the location from the which the liquid is taken does not matter. As for the calibration, for the resulting values from the experiment to be useful, we DO want the concentrations of the calibration standards to span the entire range of possible sample concentrations, but we do not want to measure them in order (again to avoid lab bias).

This is unfortunately not correct, yet! Objects of a sample must always be measured in random order to avoid carryover effects biasing the results (lab bias). Moreover, the contents of a vial in chromatography should be homogenous. This means that the liquid throughout the entire vial has a similar composition so that the location from the which the liquid is taken does not matter. As for the calibration, for the resulting values from the experiment to be useful, we DO want the concentrations of the calibration standards to span the entire range of possible sample concentrations, but we do not want to measure them in order (again to avoid lab bias).

2.2. Step 2: Hypotheses #

We have so far not focused on the hypotheses themselves. In hypothesis tests, two hypotheses are formulated. The null hypothesis,

H_0

always proposes that there is no statistical significance in the data. In analytical chemisty, it typically expresses that the effect under investigation does not exist. For the present case we are asking ourselves whether our mean

\mu

matches the reference value of 1290

\mu_{0}

. Our effect in this example would be the factor that causes the retention index to not match. In accordance with the above, our null hypothesis is therefore that there is no significant difference between our mean and the reference value (i.e. the retention index matches):

H_0: \mu=\mu_0

The alternative hypothesis ( $H_1$ or $H_a$ ), is the opposite. $H_1$ does propose that there is a difference and therefore that the effect is significant.

H_1: \mu \not= \mu_0

Note how the two hypotheses are mutually exclusive. Only one of them can be true, and one is always true.

2.3. Step 3: Determine significance level #

A hypothesis test is used to evaluate whether the data provides sufficient evidence to reject

H_0

within a confidence level. This level, also known as the significance level ( $\alpha$ ), is crucial, as it acts as the threshold for us to reject

H_0

. This can be seen in Figure 1.

Figure 1. Plot of a

t

-distrubtion at five degrees of freedom. The further the observed

t

value lies away from 0, the more it casts doubt about

H_0

. The significance level

\alpha

decides when this doubt is too much and we reject

H_0

Figure 1 shows our probability distribution for the present case, a $t$ -distribution with five degrees of freedom. We can make several observations:

This probability distribution assumes that $H_0$ is true. Given that $H_0: \mu=\mu_{0}$ , it makes sense that – assuming that $H_0$ is true – its probability density is the strongest near $t$ = 0 (see also next section).
For $H_0$ to be rejected requires $t_{\text{obs}}$ needs to deviate from 0.
At some point, $t$ will exceed the threshold, $t_{\text{crit}}$ , the significance level.
The larger the significance, the smaller $t_{\text{crit}}$ will be, and thus the easier it is to reject $H_0$ .

READ SECTION 9.3.2.3

Significance level

Read now if you did not already.

So does it matter what the significance level is? The answer is: Yes it does, a lot. The significance level (

\alpha

) is also referred to as the probability of making a Type-I error or a false positive. Suppose that we set the significance to 3% (i.e.

\alpha

= 0.03, then we run a risk of 3% that we wrongly reject

H_0

whereas it should have been accepted. This is shown in the confusion matrix in Figure 2.

Figure 2. The so-called confusion matrix shows the relation between the hypothesis, the two types of errors that can be made and several definitions for the true negative and true positive.

In practice, $\alpha$ is typically set at 5% (i.e. 0.05), but the actual value differs greatly per application field. We will investigate in Lesson 6 how this and the consequence of the value even differs per case.

Note that, because the retention index can deviate in either direction, we are checking for both a positive and negative deviation. This is called a two-sided test, and this is why the signifiance is divided over both sides, similar the with the confidence levels in Lesson 3.

2.4. Step 4: Calculate test statistic #

We have set up the entire statistical experiment. It is now time to start calculating the statistic for our sample. Here we have to make a distinction. We concluded in the previous lesson that, if the number of datapoints $n$ is sufficiently large (i.e. $n >$ 30), we can reasonably assume that $s$ represents $\sigma$ and accordingly use the $z-$ test. In the present case, we have six retention indices, so we use the $t$ -test instead. The $z$ -test will be briefly treated in later sections.

We will have to calculate our statistic, $t_{\text{obs}}$ , which is a standardized value that is calculated from sample data during a hypothesis test. The procedure that calculates the test statistic compares our data to what is expected under the $H_0$ .

For the

t

-test comparison of a mean with a reference, the test statistic,

t_{\text{obs}}

, is

Equation 9.24: $t_{\text{obs}}=\frac{|{\bar{x}-\mu_{0}}|}{s/\sqrt{n}}$

$t_{\text{obs}}$ is a standardized value that is calculated from sample data during a hypothesis test. The procedure that calculates the test statistic compares your data to what is expected under the $H_0$ .

Calculate $t_{\text{obs}}$ for the sample of retention index data using Equation 9.24. See the beginning of Section 2 on this page for more information. Note that $\mu_{0}$ = 1290. Round to 3 decimals.

This is the correct answer! For the calculation we could use Equation 9.24 and note that the equation required us to take the absolute difference between $\bar{x}$ and $\mu_{0}$ . $\bar{x} was 1288.333, s$ 3.933, and $n$ was 6 as we had six retention index values in our sample.

This is not yet the correct answer! For your calculation you can use Equation 9.24. Note that the equation requires you to take the absolute difference between $\bar{x}$ and $\mu_{0}$ . $\bar{x} should be 1288.333, s$ 3.933, and $n$ is 6 as we have six retention index values in our sample.

2.5. Step 5: Statistical test #

Now that we have our test statistic, we can conduct the actual test. There are two methods, that each give the same outcome.

USING THE CRITICAL VALUE #

The first strategy requires $t_{\text{crit}}$ to be calculated using the ICDF (Lesson 2). In essence this means calculating $t_{\text{crit}}$ for the $\alpha$ = 0.05 case in Figure 1.

$H_0$ is accepted if $t_{\text{obs}}$ lies closer to 0, than $t_{\text{crit}}$ . In other words, if $t_{\text{crit,+}}>t_{\text{obs}}>t_{\text{crit,-}}$ .

USING THE P-VALUE #

In this lesson we did not focus on it much to this point, but we can remember from Lesson 2 that the area under the curve of the PDF is the probability, also known as the p-value.

This probability allows determination how common or rare our

t

-value is under the assumption that the

H_0

is true.

We can calculate the

p

-value for this

t

-test by evaluating the CDF for

t_{\text{obs}}

. We must do this at both sides, because we are using a two-sided test! $H_0$ is accepted if

t_{\text{obs}}>p

-value.

Calculate the $p$ -value for the retention index case. Round to 3 decimals.

Correct!

Unfortunately, this is not correct. Is your answer 0.174? Then you are almost there: you calculated the p-value for one side, but still need to multiply it by two (we are doing a two-sided test). If your $p$ -value for one side was not 0.174, then be sure you calculate the CDF correctly. You should use the same mean, number of datapoints and standard deviation as for the previous question. Don’t forget that you are using the $t$ -distribution, which has $n$ -1 degrees of freedom. You will also need to fill in $t_{\text{obs}}$ .

Note that the test statistic of Equation 9.24 can also be expressed as $t_{\text{obs}}=r_A/r_B$ with $r_A=|\bar{x}-\mu_0|$ and $r_B=s/\sqrt{n}$ . These ranges of A and B are also depicted in Figure 3. As we can see, we are in fact considering whether the difference between the sample mean and the reference ( $r_A$ ) can be explained by the spread of the data ( $r_B$ ).

Figure 3. A

t

-test comparison of a mean with a reference value essentially compares the distance between the sample mean and the reference and the variation in the data. In other words: can the deviation that we see also be explained by the randomness in our data? If the deviation is too strong (beyond the significance level) then we deem

H_0

rejected.

We still have to formally establish the outcome of the hypothesis test.

What is the outcome of the hypothesis test? Does the retention index match for the sample?

Correct!

First things first. The null hypothesis $H_0$ is accepted regardless of which strategy was chosen. After all, $t_{\text{obs}}$ (1.038) was smaller than $t_{\text{crit,+}}$ (2.571) and larger than $t_{\text{crit,-}}$ (-2.571). Also the $p$ -value (0.348) is larger than the significance (0.05).

Now that we have that out of the way, we can talk about the second part. Technically with what you have learned so far, your answer was also correct if you concluded that the retention index was a match. But.. statistically speaking.. we can never know for sure whether a hypothesis is actually true. The only thing we can say is whether the present data allows us to reject $H_0$ . We will see in Lessons 5 and 6 that, with proper data, we can make any test fail or pass. So, we have to be careful here.

Oops! Well maybe your answer is still correct. Read on.

Now that we have that out of the way, we can talk about the second part. Technically with what you have learned so far, your answer is also correct if you concluded that the retention index is a match. But.. statistically speaking.. we can never know for sure whether a hypothesis is actually true. The only thing we can say is whether the present data allows us to reject $H_0$ . We will see in Lessons 5 and 6 that, with proper data, we can make any test fail or pass. So, we have to be careful here.

3. z-test comparison #

The comparison of a mean with a reference value can also be conducted using the $z$ -test. The test works exactly the same way, but the CDF and ICDF are now based on the $z$ -distribution. Furthermore, the $z$ -statistic must be calculated (Lesson 2) instead of the $t$ -statistic.

Note that the $z$ -distribution should only be used if $n$ > 30, as only then we can assume $s$ to be representative of $\sigma$ (Lesson 3).

READ SECTION 9.3.3

Tail testing

4. One-sided tests #

We have just compared a sample mean retention index to a reference value. In this specific case it did not matter whether any deviation was positive or negative, because we were interested in whether there was a match. In other words, both positive and negative were equally important and consequently we performed a two-tailed test.

There are, however, also cases where the interest is specifically aimed at either a negative or positive deviation. For example, to test whether a threshold has been exceeded (e.g. a pesticide concentration in surface water), the hypotheses could be:

$H_0: \mu\leq\mu_0$ and $H_1: \mu>\mu_0$

In this case, it is desirable to focus the test to the positive (right) direction of the PDF. We would then do a right-sided or right-tail test. The contrary is true if our interest is to determine whether a minimum concentration has been realised. For instance, material analysis that requires a variable to be at a minimum level for a product property to be achieved. Here, the hypotheses are:

$H_0: \mu\ge\mu_0$ and $H_1: \mu<\mu_0$

This would be an example of a left-tail or left-sided test. In either case, the $t$ statistic is calculated slightly differently:

Equation 9.26: $t_{\text{obs}}=\frac{{\bar{x}-\mu_{0}}}{s/\sqrt{n}}$

Something that can help knowing what to do is considering the sign $H_1$ . Generally, if the signs $H_1$ points leftwards it concerns a left-tail test, rightwards signifies a right-tail test, and an equal sign points to a two-tailed test. One-sided t-tests or z-tests work the same as the two-sided variants, with the only exception being the slightly different computation of the test statistic, and the fact that the significance now focuses on only one of the sides.

Figure 4. In a one-sided test, the significance pertains to only one of the sides. The example in the picture depicts the right-sided test.

Concluding Remarks #

We have learned how to conduct our very first hypothesis test. With hypothesis testing, we assess the probability of observing our data when $H_0$ is true. We do not assess the probability of the hypothesis in the context of our data. There are some important consequences:

The $p$ -value is the probability of obtaining the data obtained (or more extreme) supposing that the $H_0$ is true.
The $p$ -value is not the probability that the $H_0$ is true given the data.
The $p$ -value is not the probability of wrongly rejecting $H_0$ (this value is $\alpha$ ).
The $p$ -value does not inform us at all about the validity of a certain hypothesis.

However, by following the procedure of rejecting the null hypothesis when

p<\alpha

the probabilities of type-I and type-II errors can be calculated. We will see this in Lesson 6, but we will first consider the second category of hypothesis tests in Lesson 5: the comparison of two sample means.

Exercise Solution #

Below are some helpful files to check whether you did the exercise correctly or to help you forward.

Figure 5. Using the p-value or the critical value yields the exact same outcome of the hypothesis test.

				
					% Data
x = [1293, 1291, 1285, 1287, 1291, 1283];
a = 0.05;
ref = 1290;

% Gathering Information
n = length(x);
x_dof = n-1; %
x_mean = mean(x);
x_std = std(x);
p = 1-a/2; % 1-Alpha/2

% Step IV: Calculate Test Statistic
t_obs = abs(x_mean-ref)/(x_std/sqrt(n));

% Step V: Critical Value Approach
t_crit = icdf('T',p,x_dof);

if t_obs<t_crit && t_obs>-1*t_crit
    accept = 'H0';
else
    accept = 'H1';
end

% Step V: Critical Value Approach
t_crit = icdf('T',p,x_dof);

% For Two-Sided Test Specifically
if t_obs<t_crit && t_obs>-1*t_crit
    accept = 'H0';
else
    accept = 'H1';
end

% Step V: P-Value Approach
p = 2*(1-cdf('T',t_obs,x_dof));

if p>a
    accept = 'H0';
else
    accept = 'H1';
end

The attached file below contains a fully worked out example with additional explanation.

It can be downloaded here (CS_04_EE, .XLSX).

				
					using Distributions, Statistics

# Data
x           = [1293, 1291, 1285, 1287, 1291, 1283]
a           = 0.05
ref         = 1290

# Gathering information
n           = length(x)
x_dof       = n - 1
x_mean      = mean(x)
x_std       = std(x)
p           = 1 - a / 2

# Step IV: Calculate Test Statistics
t_obs       = abs(x_mean - ref) / (x_std / sqrt(n))

# Step V: Critical value approach
dist        = TDist(x_dof)
t_crit      = quantile(dist, p)

if t_crit > t_obs > (-1*t_crit)
    accept  = "HO"
else
    accept  = "H1"
end

# Step V: Critical value approach
dist        = TDist(x_dof)
t_crit      = quantile(dist, p)

# For two-sided test specifically
if t_crit > t_obs > (-1*t_crit)
    accept  = "HO"
else
    accept  = "H1"
end

# Step V: p-value approach
p           = 2 * (1 - cdf(dist, t_obs))

if p > a
    accept  = "HO"
else
    accept  = "H1"
end

INFORMATION REPOSITORY

Extra

MSc. Chemometrics & Statistics

MSc. Separation Science

04. Hypothesis Testing

Learning Goals #

READ SECTIONS 9.3.1