INFORMATION REPOSITORY

08. Comparing Several Means

Updated on February 26, 2025
Each of the statistical inferences that were addressed in the course thus far concerned the comparison of one or two datasets. It is also possible to compare more than two datasets simultaneously. In this lesson we will learn about the Analysis of Variance (ANOVA) and a special \chi^2 test. Understanding the power of these tests requires us to also discuss the multiplicity problem, which we will do first.

Learning Goals #

  • Conduct an analysis of variance (ANOVA) test to compare the means of several samples.
  • Study which factors affect the population and whether two influence each other.
  • Understand that conducting several hypothesis tests simultaneously requires adjustment of the significance level.
Analytical Separation Science by B.W.J. Pirok and P.J. Schoenmakers
READ SECTION 9.3.8

Multiplicity problem

1. Multiplicity problem #

Our natural inclination to compare several datasets simultaneously may be to simply pair-wise compare them all. Table 1 demonstrates that if we are to compare three methods with each other, this would require three t-test comparison of two sample means.

Table 1. Cross-wise comparison of three laboratories.
Lab 1 Lab 2 Lab 3
Lab 1
X
t-test
t-test
Lab 2
X
X
t-test
Lab 3
X
X
X
However, each time we do a comparison there is the e.g. 5% (\alpha = 0.05) chance per test that we make a type-I mistake. The multiplicity problem is that, as  the number of comparisons increases, the likelihood that we will find at least one pair of means to be statistically different also increases. Even if all population means are equal. If we would compare 20 methods pairwise, there is a 20×0.05 = 100% chance that we will make a type-I mistake at least once.
ANALOGY

A different analogy that is perhaps easier understood is if we stick with the two labs, but now compare the labs by considering three different factors of the method (e.g. sensitivity, retention-time stability and column pressure). As we increase the number of comparisons that we make, it becomes more likely that at least one datasets different on at least one property by pure chance (\alpha).

The more factors you start to weigh in, the higher the probability that at least one factor will differ. The problem is that if you – by pure chance (\alpha) – can find differences, then your confidence about detecting the real factor that you were after may be reduced.

To overcome this problem, the value used for \alpha can be adjusted for individual comparisons to yield an experimentwise error rate (\alpha') so that the family of comparisons together meets the significance level. This is referred to as the Bonferroni correction, and given by

Equation 9.39: \alpha'=1-{(1-\alpha)}^{1/k}

Here, k is the number of simultaneous comparisons made (in the example of the three labs, k = 3. When \alpha is small, an approximation of \alpha' through \alpha'=\alpha/k may suffice.

Calculate the error rate for the family of tests for the comparison of three labs using three t-tests when the overall significance level is 0.05. Use the Bonferroni correction.

Alright, we can now adjust the error rate to which we test our hypotheses, but we still have a major problem. We can remember from Lesson 6 that the type-II error is directly connected to the type-I error. With the Bonferroni correction we lower \alpha, which means we have also just increased \beta. Because of this, the statistical power of multiple separate comparisons is poor.
Analytical Separation Science by B.W.J. Pirok and P.J. Schoenmakers
READ SECTION 9.3.9

Analysis of variance (ANOVA)

2. Analysis of Variance (ANOVA) #

2.1. Concept #

A very powerful alternative to conduct the comparison of several groups is the use of analysis of variance (ANOVA). This type of testing allows a large number of datasets to be compared for a factor of interest. Let’s first introduce an example to guide us.

CASE: EFFECT OF ADDITIVE ON HILIC

To study selectivity properties in HILIC, a group of scientists investigates whether the retention factor for a given compound of interest is affected by the type of additive used. The dataset is shown in Table 2. Does the additive affect retention for the compound of interest in HILIC?

Table 2. Retention times measured for a compound of interest in a HILIC experiment, with 6 different additives in the mobile phase.
Additive 1 Additive 2 Additive 3 Additive 4 Additive 5 Additive 6
2.48
2.97
3.34
3.76
3.04
2.73
2.18
3.14
2.96
4.26
2.62
2.65
2.96
3.41
2.42
3.14
3.66
3.37
3.13
3.88
2.34
3.16
3.02
3.03
2.96
3.63
2.93
2.02
3.53
2.47
4.01
3.78
3.56
3.16
3.24
3.54

Table 2 features six different datasets. We could now ask the question “Which means are different?” and compare each of them with each other. Unfortunately, due to the Bonferroni correction we would end up with an inferior statistical power. Alternatively, we could change the question into “Is the factor ‘Additive’ significant?”, or in other words “Do any of the additives yield different results?”.

NOTE

The math behind the ANOVA test is very insightful in order to cultivate a better understanding of this statistical technique. This is not explained during this lesson and you are therefore encouraged to carefully 

We have in Lesson 1 established that, in the absence of bias, any measurement (x) is the sum of the true value, \mu, and an error, e. If we assume a particular factor (a) pertaining to a dataset to be significantly affect the data, then any measurement x is the same of the true value, the error and the effect:

Equation 9.40: x_{i,j}=\mu+a_j+e_{i,j}

2.2. Hypotheses #

Analysis of variance (ANOVA) investigates whether a factor under consideration significantly affects the data. The hypotheses are:

H_0: a_1=a_2=a_3=a_k and H_1: \exists1, k: a_j \neq a_{j'}

The alternative hypothesis reads are that there exists at least one pair (j=1,2,..,k) for which a_j is not equal to a_{j'}.

Figure 1. In ANOVA, the between-group variation (A) are compared with the within-group variation (B).
If there is no effect then the within-variability of each group (B in Figure 1) is similar to the between variability of the means of the groups (A in Figure 1).

2.3. Computation #

To calculate this, an ANOVA decomposes the error (or difference) in a datapoint e_{i,j} into the difference between that datapoint and the mean of the dataset (or group) it belongs to \bar{x}_j and the difference of the group mean and the grand mean (the mean of all datapoints, \bar{x}).

Equation 9.42: e_{i,j}=x_{i,j}-\bar{x}=(x_{i,j} – \bar{x}_j)+(\bar{x}_j – \bar{x})

Section 9.3.9.1 shows how this error for a datapoint is then squared (i.e. e^{2}_{i,j} to avoid the differences otherwise cancelling each other out. 

NOTE

Remember, an error can be in the positive and negative direction. If we want to see the total error of all datapoints, then we need to sum all errors of each datapoint. However, adding a positive error of +1 of one datapoint to a negative error of -1 in a different datapoint yields a total error of 0. To avoid this, we square the error, so that the answer is always positive (-1^2=1). This yields the well known “sum of squares”.

Finally, the total error (i.e. difference) can be established by summing the squared errors for all datapoints. This yields the so-called sum of squares. Equations 9.42 through 9.44b demonstrate how the total sum of squares SS_{\text{tot}} then comprises of the within-group SS_{\text{res}}, and the within-group SS_{\text{group}}, the latter of which represents the effect we investigate with ANOVA.

Equation 9.44b: SS_{\text{tot}}=SS_{\text{res}}+SS_{\text{group}}

A measure of variation for the different sum of squares is obtained by dividing it by the degrees of freedom to obtain the mean squares, MS=SS/\nu.

Equation 9.45b: MS_{\text{res}}=\frac{\sum_{j=1}^{k} \sum_{i=1}^{n_j} (x_{i,j}-\bar{x}_j)^2} {n-k}

Equation 9.45c: MS_{\text{group}}=\frac{\sum_{j=1}^{k} n_j (\bar{x}_{j}-\bar{x})^2} {k-1}

MS_{\text{group}} is the between-group variance (which describes the factor that we investigate. MS_{\text{res}} is the within-group variance, which is identical to the pooled standard deviation. 

2.4. Requirements & statistic #

As a consequence of the latter, ANOVA requires the variances to be homogenous like other hypothesis tests. In addition, data must be normally distributed and contain no outliers. Notice from Lesson 7 how we are comparing two variances. This test thus follows the F-distribution and its F-statistic is calculated as

Equation 9.46: F_{\text{obs}}=\frac{MS_{\text{group}}}{MS_{\text{res}}}

The F-distribution is described by k-1 and n-k degrees of freedom, respectively.

Execute the following function, with the data in x with each group in a different column.

				
					% Example Data
x = [3.18, 3.47, 3.34;
    3.18, 3.44, 3.06;
    2.96, 3.41, 3.02;
    3.13, 3.58, 3.04;
    2.96, 3.63, 2.83;
    3.01, 3.70, 3.06];

% Calculation
stats = anova(x);
				
			

An example file can be downloaded here (CS_08_OneWayANOVA, .XLSX). See below for further instructions.

				
					#=
Julia does not have a ANOVA funcion,therefore the following
    function was created. 
Copy this function, after running this function you can use it
    as any other regular function.
=#


##############################################
using LinearAlgebra, Distributions

"""
    ANOVA 1 analysis of variances.
    If `group` is not specified, setup groups based on columns of `x`. Otherwise, setup groups based on `group`.
    The input variables can contain missing values for `x`, which will be removed before ANOVA analysis.\n
    Parameters
    ----------
    - x : AbstractMatrix{T<:Real}
        A matrix with columns corresponding to the individual groups.\n
    - group : AbstractVector{T<:Real}, optional
        An equally sized vector as `x`.\n
    Returns
    -------
    Dict{Any,Any}
        A dictionary containing the following keys:\n
        - `"DF"` : A tuple of vectors corresponding to degrees of freedom for between groups, residual, and total.
        - `"SS"` : A tuple of vectors corresponding to the sum of squares for between groups, residual, and total.
        - `"MS"` : A tuple of vectors corresponding to the mean square for between groups and residual.
        - `"F"` : The F-statistic for the ANOVA analysis.
        - `"p-value"` : The p-value for the ANOVA analysis.
"""
function anova1(x,group = [])
    # ANOVA 1 analysis of variances
    # anova1(x), where x is a matrix with columns correpsonding to the induvidual groups
    # anova1(x,group), where x is an equally sized vector as group
    # the input variables can contains missing values for x, which will be removed before anova analysis

    if isempty(group)
        # setup groups based on x columns
        group = ones(size(x)) .* collect(1:size(x,2))'
        group = reshape(group,:,1)
        x = reshape(x,:,1)
    #setup groups based on x columns
    elseif length(x) .!= length(group)
        println("x and groups contain a different amount of elements")
        return
    else
        if size(group, 1) == 1
            group = group'
        end
        if size(x, 1) == 1
            x = x'
        end
    end

    #remove NaN values
    if any(isnan.(x))
        group = group[isnan.(x).==0]
        x = x[isnan.(x).==0]
    end

    x_ori = x
    x_mc = x .- mean(x)
    gr_n = unique(group)
    gr_m = ones(size(gr_n))
    gr_c = ones(size(gr_n))
    for i = 1:length(gr_n)
        gr_m[i] = mean(x_mc[group.== gr_n[i]])
        gr_c[i] = sum(group.==gr_n[i])
    end

    x_mean_mc = mean(x_mc)
    x_cent = gr_m .- x_mean_mc
    #degees of freedom
    df1 = length(gr_c) - 1
    df2 = length(x) - df1 - 1

    RSS = dot(gr_c, x_cent.^2)

    TSS = (x_mc .- x_mean_mc)'*(x_mc .- x_mean_mc)

    SSE = TSS[1] - RSS[1]
    if df2 > 0
        mse = SSE/df2
    else
        mse = NaN
    end

    if SSE !=0
        F = (RSS/df1) / mse
        p = 1-cdf(FDist(df1,df2),F)
    elseif RSS==0
        F = NaN;
        p = NaN;
    else
        F = Inf;
        p = 0;
    end

    #print results
    sum_df1 = df1+df2
    MS1 = RSS/df1
    println("")
    println("anova1 results")
    println("----------------------------------------------------------")
    println("Source\t\tDF\tSS\t\t\tMS\t\t\tF\t\t\tp")
    println("Between\t\t$df1\t$RSS\t$MS1\t$F\t$p     ")
    println("Residual\t$df2\t$SSE\t$mse                     ")
    println("Total\t\t$sum_df1\t$TSS                               ")

    # stats = DataFrame(Source = ["Between", "Residual", "Total"], DF = [df1, df2, sum_df1],
    #                   SS = [RSS, SSE, TSS], DF = [df1, df2, sum_df1], DF = [df1, df2, sum_df1], DF = [df1, df2, sum_df1])

    stats = Dict("DF" => (["Between","Residual", "Total"],[df1, df2, sum_df1]),
                "SS" => (["RSS", "SSE", "TSS"],[RSS, SSE, TSS]),
                "MS" => (["Between","Residual"],[MS1, mse]),
                "F" => F, "p-value" => p)

    return stats
end



##############################################
# Example Data
x = [3.18 3.47 3.34;
     3.18 3.44 3.06;
     2.96 3.41 3.02;
     3.13 3.58 3.04;
     2.96 3.63 2.83;
     3.01 3.70 3.06]

# Perform ANOVA
anova1(x)
				
			

2.5. Interpretation #

The test is concluded, like the earlier hypothesis tests, by comparing F_{\text{obs}} with F_{\text{crit}}. Most computational tools give an output table as shown in Table 3.

Table 3. Example of an ANOVA table. See Table 9.5 in the book for a more detailed version. Data in the present table pertains to the example code in Section 2.4.
Source DOF SS MS F p-value
Between groups
2
0.8997
0.4499
26.3631
1.2304E-05
Within groups
15
0.2559
0.0171
Total
17
1.1557

As can be seen Table 3 displays various calculations, as well as the resulting p-value. See Table 9.5 in the book for a detailed description of the computations in the table. In the example in Table 3, the p-value is lower than \alpha and thus H_0 is reject and the investigated factor is deemed significant.

Conduct the ANOVA test on the data shown in Table 2. Does it matter which buffer additive is used? Or in other words: “Is the factor buffer additive significant?”. Select all correct answers.

2.6. High statistical power #

The ANOVA test involves a single F statistic and therefore only conducts just a single hypothesis test. As such, no Bonferroni correction is required and ANOVA will generally therefore feature a superior statistical power to pairwise t-tests.

However, this statistical power is lost if the data is of insufficient quality for the aimed test. To demonstrate this, Table 4 contains a dataset similar to that of Table 2, but obtained with much higher precision.

Table 4. Retention times measured for a compound of interest in a HILIC experiment, with 6 different additives in the mobile phase. Obtained with a higher precision compared to Table 2.
Additive 1 Additive 2 Additive 3 Additive 4 Additive 5 Additive 6
3.18
3.47
3.34
3.26
3.04
2.93
3.18
3.44
3.06
3.26
3.22
2.90
2.96
3.41
3.02
3.14
3.26
2.97
3.13
3.58
3.04
3.16
3.02
3.00
2.96
3.63
2.83
3.02
3.13
2.87
3.01
3.70
3.06
3.16
3.14
2.94

Conduct the ANOVA test once more, but now on the data shown in Table 4. Does it matter which buffer additive is used? Or in other words: “Is the factor buffer additive significant?”. Select all correct answers. Also plot the data as a box-and-whisker plot for both Table 2 and Table 4 and see if you can use the plots to explain the results.

We now see that the conclusion is completely different. This can be better understood if we plot the data as box-and-whisker plots and – even better – conduct power analysis. This is shown in Figure 2.

Figure 2. Box-and-whisker plots and power analysis for the data from Table 2 (right) and Table 4 (left).

3. Two-way ANOVA #

When several factors are of interest, it is possible to conduct an n-way ANOVA. In this course will only treat the two-way ANOVA. An example of a dataset is shown in Table X.

Table 5. Example of two-way ANOVA data with three repetitions for each combination of the two factors under investigation (buffer concentration and buffer type). Values are retention times obtained in a HILIC method. Identical to Table 9.6 from book.
Buffer 1 Buffer 2 Buffer 3
Concentration 1
3.11
3.10
3.21
3.09
2.99
3.18
3.24
3.33
2.90
Concentration 1
2.99
3.17
3.10
2.90
3.17
3.15
3.17
3.01
3.11
Concentration 3
3.01
2.90
3.17
2.90
3.00
3.01
2.90
2.98
3.02

In Table x, we see that three measurements were obtained for two different factors (buffer concentration and buffer type). We can now use ANOVA to consider:

  1. Whether the buffer concentration affects the results (Factor 1)
  2. Whether the buffer type affects the results (Factor 2)
  3. Interestingly: Whether there is any interaction between the buffer concentration and type (Factor 1 vs Factor 2).

The term interaction refers to the interdependency of the two factors under investigation.

A value in this matrix x_{i,j,h} now also comprises of effect b_h and – if investigated – the interaction (ab)_{j,h}.

Equation 9.47: x_{i,j,h}=\mu+a_j+b_h+(ab)_{j,h}+e_{i,j,h}

We now have to inform ANOVA which information belongs to what group. Our strategy below is that the first column represents the actual data, the second column represents the first factor (Concentration, rows), and the third column the second factor (Type, columns). The latter assignment is multiplied by 10 to allow the Levene test easier, which is covered in Lesson 10. You may ignore it for now.

				
					% Example Data
x=[
3.11 1 10
3.09 1 10
3.24 1 10
3.10 1 20
2.99 1 20
3.33 1 20
3.21 1 30
3.18 1 30
2.90 1 30
2.99 2 10
2.90 2 10
3.17 2 10
3.17 2 20
3.17 2 20
3.01 2 20
3.10 2 30
3.15 2 30
3.11 2 30
3.01 3 10
2.90 3 10
2.90 3 10
2.90 3 20
3.00 3 20
2.98 3 20
3.17 3 30
3.01 3 30
3.02 3 30
];

% Calculation Without Interactions
[p,anovatab,stats]=anovan(x(:,1),x(:,2:3), ...
    'model', 'linear');

% Calculation With Interactions
[p,anovatab,stats]=anovan(x(:,1),x(:,2:3), ...
    'model','interaction');
				
			

An example file can be downloaded here (CS_08_TwoWayANOVA, .XLSX). See below for further instructions.

See the Excel sheet for further instructions.

				
					#=
Julia does not have a ANOVA funcion,therefore the following
    function was created. 
Copy this function, after running this function you can use it
    as any other regular function.
=#

##############################################
using LinearAlgebra, Distributions

function anova2(x,reps,interaction = true)
    #  ANOVA2 Two-way analysis of variance with repetitions
    # x represent a matrix where the columns represent changes in one factor and
    # rows in another factor. The number of repetition in each column is represented by reps.
    # The minimum number of culumns and repetitions is 2.


    r, c = size(x)
    x_ori = copy(x)

    if mod(size(x,1),reps) .!= 0
        error("number of rows of X is not a multiplication of reps")
    end

    if any(isnan.(x).==1)
        error("NaN foun. This function cannot handle missing data")
    end

    #calculate mean of each group
    m = Int(r/reps)
    x_gr = reshape(x,reps,:)
    # edf = 0                         #used in case of missing values


    x_m = zeros(size(x_gr,2))
    for i = 1:length(x_m)
        vals = x_gr[:,i]
        vals = vals[isnan.(vals).==0]
        x_m[i] = mean(vals)
        # if any(isnan.(x) .== 1)
        #   edf += length(vals) - 1
        # end
    end
    x_m = reshape(x_m,reps,:)

    colmean = mean(x_m,dims = 1)
    rowmean = mean(x_m,dims = 2)'
    gm = mean(x_m)
    dfc = c-1            # Column degrees of freedom
    dfr = m-1           # Row degrees of freedom




    if reps == 1
        edf = (c-1)*(r-1)       # Error dof, assumes an additive model.
    else
        # if all(isnan.(x) .== 0)
        edf = (c*m*(reps-1))    # Error dof
        #end
        idf = (c-1)*(m-1)       # Interaction dof
    end

    CSS = (m*reps*(colmean .- gm)*(colmean.-gm)')[1]      # Column SS
    RSS = (c*reps*(rowmean .- gm)*(rowmean.-gm)')[1]     # Row SS
    correction = (c*m*reps)*gm.^2
    ISS = reps*sum(x_m .* x_m) - correction - CSS - RSS              #Interaction SS
    # end

    x2 = reshape(x .* x,1,:)                                         #precalculate x*x to allow fos isnan removal
    TSS = sum(x2[isnan.(x2).==0]) - correction                       #Total SS

    if reps == 1
        SSE = ISS
    else
        if interaction
        SSE = TSS - CSS - RSS - ISS     #Error Sum of Squares
        else
        SSE = TSS - CSS - RSS
        edf = edf + idf
        end
    end

    ip = NaN
    if SSE !=0
        MSE = SSE/edf
        colf = (CSS/dfc) / MSE
        rowf = (RSS/dfr) / MSE
        colp = 1-cdf(FDist(dfc,edf),colf) # P of F given == column means
        rowp = 1-cdf(FDist(dfr,edf),rowf) # P of F given == row means
        p  = [colp rowp]
        if reps > 1
            intf = (ISS/idf)/MSE
            ip  = 1-cdf(FDist(idf,edf),intf)
            p  = [p ip]
        end
    else          # no error
        if edf > 0
            MSE = 0
        else
            MSE = NaN
        end
        if CSS==0     # No between column variability.
            colf = NaN
            colp = NaN
        else        # Between column variability.
            colf = Inf
            colp = 0
        end
        if RSS==0     # No between row variability.
            rowf = NaN
            rowp = NaN
        else        # Between row variability.
            rowf = Inf
            rowp = 0
        end
        p = [colp rowp]
        if reps>1 && ISS==0 # Replication but no interactions.
            intf = NaN
            p = [p NaN]
        elseif reps>1     # Replication with interactions.
            intf = Inf
            p = [p 0]
        end
    end


    #plottin and saving values
    if interaction
        sum_df1 = idf + edf + dfc + dfr
    else
        sum_df1 = edf + dfc + dfr
    end
    MS1 = CSS/dfc
    MS2 = RSS/dfr
    MS3 = ISS/idf
    p1 = p[1]
    p2 = p[2]
    p3 = p[3]

    if interaction
        println("")
        println("anova2 results")
        println("----------------------------------------------------------")
        println("Source\t\tDF\tSS\t\t\tMS\t\t\tF\t\t\tp")
        println("X1\t\t$dfc\t$CSS\t$MS1\t$colf\t$p1     ")
        println("X2\t\t$dfr\t$RSS\t$MS2\t$rowf\t$p2                    ")
        println("X2*X1\t\t$idf\t$ISS\t$MS3\t$intf\t$p3                     ")
        println("Error\t\t$edf\t$SSE\t$MSE")
        println("Total\t\t$sum_df1\t$TSS                               ")

        stats = Dict("DF" => (["X1","X2", "X1*X2","Error","Total"],[dfc, dfr, idf, edf, sum_df1]),
                    "SS" => (["CSS","RSS","ISS", "SSE", "TSS"],[CSS, RSS, ISS, SSE, TSS]),
                    "MS" => (["MSC","MSR", "MSI", "MSE"],[MS1, MS2, MS3, MSE]),
                    "F" => (["Col", "Row", "Interaction"], [colf, rowf, intf]),
                    "p-value" => (["Col", "Row", "Interaction"], [p1, p2, p3]))



    else
        println("")
        println("anova2 results")
        println("----------------------------------------------------------")
        println("Source\t\tDF\tSS\t\t\tMS\t\t\tF\t\t\tp")
        println("X1\t\t$dfc\t$CSS\t$MS1\t$colf\t$p1     ")
        println("X2\t\t$dfr\t$RSS\t$MS2\t$rowf\t$p2                    ")
        # println("X2*X1\t\t$idf\t$ISS\t$MS3\t$intf\t$p3                     ")
        println("Error\t\t$edf\t$SSE\t$MSE")
        println("Total\t\t$sum_df1\t$TSS                               ")

        stats = Dict("DF" => (["X1","X2","Error","Total"],[dfc, dfr, edf, sum_df1]),
                    "SS" => (["CSS","RSS", "SSE", "TSS"],[CSS, RSS, SSE, TSS]),
                    "MS" => (["MSC","MSR", "MSE"],[MS1, MS2, MSE]),
                    "F" => (["Col", "Row"], [colf, rowf]),
                    "p-value" => (["Col", "Row"], [p1, p2]))
    end

    return stats
end



##############################################
x = [   3.11 1 10
        3.09 1 10
        3.24 1 10
        3.10 1 20
        2.99 1 20
        3.33 1 20
        3.21 1 30
        3.18 1 30
        2.90 1 30
        2.99 2 10
        2.90 2 10
        3.17 2 10
        3.17 2 20
        3.17 2 20
        3.01 2 20
        3.10 2 30
        3.15 2 30
        3.11 2 30
        3.01 3 10
        2.90 3 10
        3.33 3 10
        2.90 3 20
        3.00 3 20
        2.98 3 20
        3.17 3 30
        3.01 3 30
        3.02 3 30]

# Calculation Without Interactions
anova2(x, 3)

# Calculation With Interactions
anova2(x, 3, false)
				
			

Computation of the two-way ANOVA as well as the interaction will each (!) consume more degrees of freedom at the cost of statistical power.

Conduct the ANOVA and study the results. Which statements are true? For the statements concerning the interaction affecting the statistical power you need to compare the test with and without evaluation the interaction (this may not be possible using Excel!).

Concluding remarks #

We now have the capability of comparing several means simultaneously. Conveniently, ANOVA allows us to do so while circumventing the need to do the Bonferroni correction.

We now have learned quite some hypothesis tests. Like ANOVA, many depend on several requirements. We will use the next lesson to test whether these requirements are met.

Is this article useful?