Chapter 5: ANOVA for Designed Experiments
|
In Simple Linear Regression Analysis and Multiple Linear Regression Analysis, methods were presented to model the relationship between a response and the associated factors (referred to as predictor variables in the context of regression) based on an observed data set. Such studies, where observed values of the response are used to establish an association between the response and the factors, are called observational studies. However, in the case of observational studies, it is difficult to establish a cause-and-effect relationship between the observed factors and the response. This is because a number of alternative justifications can be used to explain the observed change in the response values. For example, a regression model fitted to data on the population of cities and road accidents might show a positive regression relation. However, this relation does not imply that an increase in a city's population causes an increase in road accidents. It could be that a number of other factors such as road conditions, traffic control and the degree to which the residents of the city follow the traffic rules affect the number of road accidents in the city and the increase in the number of accidents seen in the study is caused by these factors. Since the observational study does not take the effect of these factors into account, the assumption that an increase in a city's population will lead to an increase in road accidents is not a valid one. For example, the population of a city may increase but road accidents in the city may decrease because of better traffic control. To establish a cause-and-effect relationship, the study should be conducted in such a way that the effect of all other factors is excluded from the investigation.
The studies that enable the establishment of a cause-and-effect relationship are called experiments. In experiments the response is investigated by studying only the effect of the factor(s) of interest and excluding all other effects that may provide alternative justifications to the observed change in response. This is done in two ways. First, the levels of the factors to be investigated are carefully selected and then strictly controlled during the execution of the experiment. The aspect of selecting what factor levels should be investigated in the experiment is called the design of the experiment. The second distinguishing feature of experiments is that observations in an experiment are recorded in a random order. By doing this, it is hoped that the effect of all other factors not being investigated in the experiment will get cancelled out so that the change in the response is the result of only the investigated factors. Using these two techniques, experiments tend to ensure that alternative justifications to observed changes in the response are voided, thereby enabling the establishment of a cause-and-effect relationship between the response and the investigated factors.
Randomization
The aspect of recording observations in an experiment in a random order is referred to as randomization. Specifically, randomization is the process of assigning the various levels of the investigated factors to the experimental units in a random fashion. An experiment is said to be completely randomized if the probability of an experimental unit to be subjected to any level of a factor is equal for all the experimental units. The importance of randomization can be illustrated using an example. Consider an experiment where the effect of the speed of a lathe machine on the surface finish of a product is being investigated. In order to save time, the experimenter records surface finish values by running the lathe machine continuously and recording observations in the order of increasing speeds. The analysis of the experiment data shows that an increase in lathe speeds causes a decrease in the quality of surface finish. However the results of the experiment are disputed by the lathe operator who claims that he has been able to obtain better surface finish quality in the products by operating the lathe machine at higher speeds. It is later found that the faulty results were caused because of overheating of the tool used in the machine. Since the lathe was run continuously in the order of increased speeds the observations were recorded in the order of increased tool temperatures. This problem could have been avoided if the experimenter had randomized the experiment and taken reading at the various lathe speeds in a random fashion. This would require the experimenter to stop and restart the machine at every observation, thereby keeping the temperature of the tool within a reasonable range. Randomization would have ensured that the effect of heating of the machine tool is not included in the experiment.
Analysis of Single Factor Experiments
As explained in Simple Linear Regression Analysis and Multiple Linear Regression Analysis, the analysis of observational studies involves the use of regression models. The analysis of experimental studies involves the use of analysis of variance (ANOVA) models. For a comparison of the two models see Fitting ANOVA Models. In single factor experiments, ANOVA models are used to compare the mean response values at different levels of the factor. Each level of the factor is investigated to see if the response is significantly different from the response at other levels of the factor. The analysis of single factor experiments is often referred to as one-way ANOVA.
To illustrate the use of ANOVA models in the analysis of experiments, consider a single factor experiment where the analyst wants to see if the surface finish of certain parts is affected by the speed of a lathe machine. Data is collected for three speeds (or three treatments). Each treatment is replicated four times. Therefore, this experiment design is balanced. Surface finish values recorded using randomization are shown in the following table.
Surface finish values for three speeds of a lathe machine.
The ANOVA model for this experiment can be stated as follows:

The ANOVA model assumes that the response at each factor level,
, is the sum of the mean response at the
th level,
, and a random error term,
. The subscript
denotes the factor level while the subscript
denotes the replicate. If there are
levels of the factor and
replicates at each level then
and
. The random error terms,
, are assumed to be normally and independently distributed with a mean of zero and variance of
. Therefore, the response at each level can be thought of as a normally distributed population with a mean of
and constant variance of
. The equation given above is referred to as the means model.
The ANOVA model of the means model can also be written using
, where
represents the overall mean and
represents the effect due to the
th treatment.

Such an ANOVA model is called the effects model. In the effects models the treatment effects,
, represent the deviations from the overall mean,
. Therefore, the following constraint exists on the
s:

Fitting ANOVA Models
To fit ANOVA models and carry out hypothesis testing in single factor experiments, it is convenient to express the effects model of the effects model in the form
(that was used for multiple linear regression models in Multiple Linear Regression Analysis). This can be done as shown next. Using the effects model, the ANOVA model for the single factor experiment in the first table can be expressed as:

where
represents the overall mean and
represents the
th treatment effect. There are three treatments in the first table (500, 600 and 700). Therefore, there are three treatment effects,
,
and
. The following constraint exists for these effects:

For the first treatment, the ANOVA model for the single factor experiment in the above table can be written as:

Using
, the model for the first treatment is:

Models for the second and third treatments can be obtained in a similar way. The models for the three treatments are:

The coefficients of the treatment effects
and
can be expressed using two indicator variables,
and
, as follows:

Using the indicator variables
and
, the ANOVA model for the data in the first table now becomes:

The equation can be rewritten by including subscripts
(for the level of the factor) and
(for the replicate number) as:

The equation given above represents the "regression version" of the ANOVA model.
Treat Numerical Factors as Qualitative or Quantitative ?
It can be seen from the equation given above that in an ANOVA model each factor is treated as a qualitative factor. In the present example the factor, lathe speed, is a quantitative factor with three levels. But the ANOVA model treats this factor as a qualitative factor with three levels. Therefore, two indicator variables,
and
, are required to represent this factor.
Note that in a regression model a variable can either be treated as a quantitative or a qualitative variable. The factor, lathe speed, would be used as a quantitative factor and represented with a single predictor variable in a regression model. For example, if a first order model were to be fitted to the data in the first table, then the regression model would take the form
. If a second order regression model were to be fitted, the regression model would be
. Notice that unlike these regression models, the regression version of the ANOVA model does not make any assumption about the nature of relationship between the response and the factor being investigated.
The choice of treating a particular factor as a quantitative or qualitative variable depends on the objective of the experimenter. In the case of the data of the first table, the objective of the experimenter is to compare the levels of the factor to see if change in the levels leads to a significant change in the response. The objective is not to make predictions on the response for a given level of the factor. Therefore, the factor is treated as a qualitative factor in this case. If the objective of the experimenter were prediction or optimization, the experimenter would focus on aspects such as the nature of relationship between the factor, lathe speed, and the response, surface finish, so that the factor should be modeled as a quantitative factor to make accurate predictions.
Expression of the ANOVA Model in the Form 
The regression version of the ANOVA model can be expanded for the three treatments and four replicates of the data in the first table as follows:
The corresponding matrix notation is:

- where
![{\displaystyle y=\left[{\begin{matrix}{{Y}_{11}}\\{{Y}_{21}}\\{{Y}_{31}}\\{{Y}_{12}}\\{{Y}_{22}}\\.\\.\\.\\{{Y}_{34}}\\\end{matrix}}\right]=X\beta +\epsilon =\left[{\begin{matrix}1&1&0\\1&0&1\\1&-1&-1\\1&1&0\\1&0&1\\.&.&.\\.&.&.\\.&.&.\\1&-1&-1\\\end{matrix}}\right]\left[{\begin{matrix}\mu \\{{\tau }_{1}}\\{{\tau }_{2}}\\\end{matrix}}\right]+\left[{\begin{matrix}{{\epsilon }_{11}}\\{{\epsilon }_{21}}\\{{\epsilon }_{31}}\\{{\epsilon }_{12}}\\{{\epsilon }_{22}}\\.\\.\\.\\{{\epsilon }_{34}}\\\end{matrix}}\right]\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/596b153dbb200323cc1bf69d63457e8d852cd85b)
- Thus:
![{\displaystyle {\begin{aligned}y=&X\beta +\epsilon \\&&\\&\left[{\begin{matrix}6\\13\\23\\13\\16\\.\\.\\.\\18\\\end{matrix}}\right]=&\left[{\begin{matrix}1&1&0\\1&0&1\\1&-1&-1\\1&1&0\\1&0&1\\.&.&.\\.&.&.\\.&.&.\\1&-1&-1\\\end{matrix}}\right]\left[{\begin{matrix}\mu \\{{\tau }_{1}}\\{{\tau }_{2}}\\\end{matrix}}\right]+\left[{\begin{matrix}{{\epsilon }_{11}}\\{{\epsilon }_{21}}\\{{\epsilon }_{31}}\\{{\epsilon }_{12}}\\{{\epsilon }_{22}}\\.\\.\\.\\{{\epsilon }_{34}}\\\end{matrix}}\right]\end{aligned}}\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/cc5a8f53d961fe00723c3f0f334353bab3833a1d)
The matrices
,
and
are used in the calculation of the sum of squares in the next section. The data in the first table can be entered into DOE++ as shown in the figure below.
Single factor experiment design for the data in the first table.
Hypothesis Test in Single Factor Experiments
The hypothesis test in single factor experiments examines the ANOVA model to see if the response at any level of the investigated factor is significantly different from that at the other levels. If this is not the case and the response at all levels is not significantly different, then it can be concluded that the investigated factor does not affect the response. The test on the ANOVA model is carried out by checking to see if any of the treatment effects,
, are non-zero. The test is similar to the test of significance of regression mentioned in Simple Linear Regression Analysis and Multiple Linear Regression Analysis in the context of regression models. The hypotheses statements for this test are:

The test for
is carried out using the following statistic:

where
represents the mean square for the ANOVA model and
is the error mean square. Note that in the case of ANOVA models we use the notation
(treatment mean square) for the model mean square and
(treatment sum of squares) for the model sum of squares (instead of
, regression mean square, and
, regression sum of squares, used in Simple Linear Regression Analysis and Multiple Linear Regression Analysis). This is done to indicate that the model under consideration is the ANOVA model and not the regression model. The calculations to obtain
and
are identical to the calculations to obtain
and
explained in Multiple Linear Regression Analysis.
Calculation of the Statistic 
The sum of squares to obtain the statistic
can be calculated as explained in Multiple Linear Regression Analysis. Using the data in the first table, the model sum of squares,
, can be calculated as:
![{\displaystyle {\begin{aligned}S{{S}_{TR}}=&{{y}^{\prime }}[H-({\frac {1}{{{n}_{a}}\cdot m}})J]y\\=&{{\left[{\begin{matrix}6\\13\\.\\.\\18\\\end{matrix}}\right]}^{\prime }}\left[{\begin{matrix}0.1667&-0.0833&.&.&-0.0833\\-0.0833&0.1667&.&.&-0.0833\\.&.&.&.&.\\.&.&.&.&.\\-0.0833&-0.0833&.&.&0.1667\\\end{matrix}}\right]\left[{\begin{matrix}6\\13\\.\\.\\18\\\end{matrix}}\right]\\=&232.1667\end{aligned}}\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/4f9ffef91158c4d52b1d0c8f80ecb4690cb5ba47)
In the previous equation,
represents the number of levels of the factor,
represents the replicates at each level,
represents the vector of the response values,
represents the hat matrix and
represents the matrix of ones. (For details on each of these terms, refer to Multiple Linear Regression Analysis.)
Since two effect terms,
and
, are used in the regression version of the ANOVA model, the degrees of freedom associated with the model sum of squares,
, is two.

The total sum of squares,
, can be obtained as follows:
![{\displaystyle {\begin{aligned}S{{S}_{T}}=&{{y}^{\prime }}[I-({\frac {1}{{{n}_{a}}\cdot m}})J]y\\=&{{\left[{\begin{matrix}6\\13\\.\\.\\18\\\end{matrix}}\right]}^{\prime }}\left[{\begin{matrix}0.9167&-0.0833&.&.&-0.0833\\-0.0833&0.9167&.&.&-0.0833\\.&.&.&.&.\\.&.&.&.&.\\-0.0833&-0.0833&.&.&0.9167\\\end{matrix}}\right]\left[{\begin{matrix}6\\13\\.\\.\\18\\\end{matrix}}\right]\\=&306.6667\end{aligned}}\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/cc133d68192c44887526db2ca7eec7238e39ed6c)
In the previous equation,
is the identity matrix. Since there are 12 data points in all, the number of degrees of freedom associated with
is 11.

Knowing
and
, the error sum of squares is:

The number of degrees of freedom associated with
is:

The test statistic can now be calculated using the equation given in Hypothesis Test in Single Factor Experiments as:

The
value for the statistic based on the
distribution with 2 degrees of freedom in the numerator and 9 degrees of freedom in the denominator is:

Assuming that the desired significance level is 0.1, since
value < 0.1,
is rejected and it is concluded that change in the lathe speed has a significant effect on the surface finish. DOE++ displays these results in the ANOVA table, as shown in the figure below. The values of S and R-sq are the standard error and the coefficient of determination for the model, respectively. These values are explained in Multiple Linear Regression Analysis and indicate how well the model fits the data. The values in the figure below indicate that the fit of the ANOVA model is fair.
ANOVA table for the data in the first table.
Confidence Interval on the ith Treatment Mean
The response at each treatment of a single factor experiment can be assumed to be a normal population with a mean of
and variance of
provided that the error terms can be assumed to be normally distributed. A point estimator of
is the average response at each treatment,
. Since this is a sample average, the associated variance is
, where
is the number of replicates at the
th treatment. Therefore, the confidence interval on
is based on the
distribution. Recall from Statistical Background on DOE (inference on population mean when variance is unknown) that:

has a
distribution with degrees of freedom
. Therefore, a 100 (
) percent confidence interval on the
th treatment mean,
, is:

For example, for the first treatment of the lathe speed we have:

In DOE++, this value is displayed as the Estimated Mean for the first level, as shown in the Data Summary table in the figure below. The value displayed as the standard deviation for this level is simply the sample standard deviation calculated using the observations corresponding to this level. The 90% confidence interval for this treatment is:

The 90% limits on
are 5.9 and 11.1, respectively.
Data Summary table for the single factor experiment in the first table.
Confidence Interval on the Difference in Two Treatment Means
The confidence interval on the difference in two treatment means,
, is used to compare two levels of the factor at a given significance. If the confidence interval does not include the value of zero, it is concluded that the two levels of the factor are significantly different. The point estimator of
is
. The variance for
is:

For balanced designs all
. Therefore:

The standard deviation for
can be obtained by taking the square root of
and is referred to as the pooled standard error:

The
statistic for the difference is:

Then a 100 (1-
) percent confidence interval on the difference in two treatment means,
, is:

For example, an estimate of the difference in the first and second treatment means of the lathe speed,
, is:

The pooled standard error for this difference is:

To test
, the
statistic is:

In DOE++, the value of the statistic is displayed in the Mean Comparisons table under the column T Value as shown in the figure below. The 90% confidence interval on the difference
is:

Hence the 90% limits on
are
and
, respectively. These values are displayed under the Low CI and High CI columns in the following figure. Since the confidence interval for this pair of means does not included zero, it can be concluded that these means are significantly different at 90% confidence. This conclusion can also be arrived at using the
value noting that the hypothesis is two-sided. The
value corresponding to the statistic
, based on the
distribution with 9 degrees of freedom is:

Since
value < 0.1, the means are significantly different at 90% confidence. Bounds on the difference between other treatment pairs can be obtained in a similar manner and it is concluded that all treatments are significantly different.
Residual Analysis
Plots of residuals,
, similar to the ones discussed in the previous chapters on regression, are used to ensure that the assumptions associated with the ANOVA model are not violated. The ANOVA model assumes that the random error terms,
, are normally and independently distributed with the same variance for each treatment. The normality assumption can be checked by obtaining a normal probability plot of the residuals.
Mean Comparisons table for the data in the first table.
Equality of variance is checked by plotting residuals against the treatments and the treatment averages,
(also referred to as fitted values), and inspecting the spread in the residuals. If a pattern is seen in these plots, then this indicates the need to use a suitable transformation on the response that will ensure variance equality. Box-Cox transformations are discussed in the next section. To check for independence of the random error terms residuals are plotted against time or run-order to ensure that a pattern does not exist in these plots. Residual plots for the given example are shown in the following two figures. The plots show that the assumptions associated with the ANOVA model are not violated.
Normal probability plot of residuals for the single factor experiment in the first table.
Plot of residuals against fitted values for the single factor experiment in the first table.
Box-Cox Method
Transformations on the response may be used when residual plots for an experiment show a pattern. This indicates that the equality of variance does not hold for the residuals of the given model. The Box-Cox method can be used to automatically identify a suitable power transformation for the data based on the following relationship:

is determined using the given data such that
is minimized. The values of
are not used as is because of issues related to calculation or comparison of
values for different values of
. For example, for
all response values will become 1. Therefore, the following relationship is used to obtain
:

where
.
Once all
values are obtained for a value of
, the corresponding
for these values is obtained using
. The process is repeated for a number of
values to obtain a plot of
against
. Then the value of
corresponding to the minimum
is selected as the required transformation for the given data. DOE++ plots
values against
values because the range of
values is large and if this is not done, all values cannot be displayed on the same plot. The range of search for the best
value in the software is from
to
, because larger values of of
are usually not meaningful. DOE++ also displays a recommended transformation based on the best
value obtained as shown in the table below.
Best Lambda |
Recommended Transformation |
Equation |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
 |
Confidence intervals on the selected
values are also available. Let
be the value of
corresponding to the selected value of
. Then, to calculate the 100 (1-
) percent confidence intervals on
, we need to calculate
as shown next:

The required limits for
are the two values of
corresponding to the value
(on the plot of
against
). If the limits for
do not include the value of one, then the transformation is applicable for the given data.
Note that the power transformations are not defined for response values that are negative or zero. DOE++ deals with negative and zero response values using the following equations (that involve addition of a suitable quantity to all of the response values if a zero or negative response value is encountered).

Here
represents the minimum response value and
represents the absolute value of the minimum response.
Example
To illustrate the Box-Cox method, consider the experiment given in the first table. Transformed response values for various values of
can be calculated using the equation for
given in Box-Cox Method. Knowing the hat matrix,
,
values corresponding to each of these
values can easily be obtained using
.
values calculated for
values between
and
for the given data are shown below:
A plot of
for various
values, as obtained from DOE++, is shown in the following figure. The value of
that gives the minimum
is identified as 0.7841. The
value corresponding to this value of
is 73.74. A 90% confidence interval on this
value is calculated as follows.
can be obtained as shown next:

Therefore,
. The
values corresponding to this value from the following figure are
and
. Therefore, the 90% confidence limits on are
and
. Since the confidence limits include the value of 1, this indicates that a transformation is not required for the data in the first table.
Box-Cox power transformation plot for the data in the first table.
Experiments with Several Factors - Factorial Experiments
Experiments with two or more factors are encountered frequently. The best way to carry out such experiments is by using factorial experiments. Factorial experiments are experiments in which all combinations of factors are investigated in each replicate of the experiment. Factorial experiments are the only means to completely and systematically study interactions between factors in addition to identifying significant factors. One-factor-at-a-time experiments (where each factor is investigated separately by keeping all the remaining factors constant) do not reveal the interaction effects between the factors. Further, in one-factor-at-a-time experiments full randomization is not possible.
To illustrate factorial experiments consider an experiment where the response is investigated for two factors,
and
. Assume that the response is studied at two levels of factor
with
representing the lower level of
and
representing the higher level of
. Similarly, let
and
represent the two levels of factor
that are being investigated in this experiment. Since there are two factors with two levels, a total of
combinations exist (
-
, -
,
-
, -
). Thus, four runs are required for each replicate if a factorial experiment is to be carried out in this case. Assume that the response values for each of these four possible combinations are obtained as shown in the third table.
Two-factor factorial experiment.
Interaction plot for the data in the third table.
Investigating Factor Effects
The effect of factor
on the response can be obtained by taking the difference between the average response when
is high and the average response when
is low. The change in the response due to a change in the level of a factor is called the main effect of the factor. The main effect of
as per the response values in the third table is:

Therefore, when
is changed from the lower level to the higher level, the response increases by 20 units. A plot of the response for the two levels of
at different levels of
is shown in the figure above. The plot shows that change in the level of
leads to an increase in the response by 20 units regardless of the level of
. Therefore, no interaction exists in this case as indicated by the parallel lines on the plot. The main effect of
can be obtained as:

Investigating Interactions
Now assume that the response values for each of the four treatment combinations were obtained as shown in the fourth table. The main effect of
in this case is:

Two factor factorial experiment.
It appears that
does not have an effect on the response. However, a plot of the response of
at different levels of
shows that the response does change with the levels of
but the effect of
on the response is dependent on the level of
(see the figure below). Therefore, an interaction between
and
exists in this case (as indicated by the non-parallel lines of the figure). The interaction effect between
and
can be calculated as follows:
Interaction plot for the data in the fourth table.

Note that in this case, if a one-factor-at-a-time experiment were used to investigate the effect of factor
on the response, it would lead to incorrect conclusions. For example, if the response at factor
was studied by holding
constant at its lower level, then the main effect of
would be obtained as
, indicating that the response increases by 20 units when the level of
is changed from low to high. On the other hand, if the response at factor
was studied by holding
constant at its higher level than the main effect of
would be obtained as
, indicating that the response decreases by 20 units when the level of
is changed from low to high.
Analysis of General Factorial Experiments
In DOE++, factorial experiments are referred to as factorial designs. The experiments explained in this section are referred to as general factorial designs. This is done to distinguish these experiments from the other factorial designs supported by DOE++ (see the figure below).
Factorial experiments available in DOE++.
The other designs (such as the two level full factorial designs that are explained in Two Level Factorial Experiments) are special cases of these experiments in which factors are limited to a specified number of levels. The ANOVA model for the analysis of factorial experiments is formulated as shown next. Assume a factorial experiment in which the effect of two factors,
and
, on the response is being investigated. Let there be
levels of factor
and
levels of factor
. The ANOVA model for this experiment can be stated as:

where:
represents the overall mean effect
is the effect of the
th level of factor
(
)
is the effect of the
th level of factor
(
)
represents the interaction effect between
and 
represents the random error terms (which are assumed to be normally distributed with a mean of zero and variance of
)
- and the subscript
denotes the
replicates (
)
Since the effects
,
and
represent deviations from the overall mean, the following constraints exist:




Hypothesis Tests in General Factorial Experiments
These tests are used to check whether each of the factors investigated in the experiment is significant or not. For the previous example, with two factors,
and
, and their interaction,
, the statements for the hypothesis tests can be formulated as follows:

The test statistics for the three tests are as follows:
- 1)

- where
is the mean square due to factor
and
is the error mean square.
- 2)

- where
is the mean square due to factor
and
is the error mean square.
- 3)

- where
is the mean square due to interaction
and
is the error mean square.
The tests are identical to the partial
test explained in Multiple Linear Regression Analysis. The sum of squares for these tests (to obtain the mean squares) are calculated by splitting the model sum of squares into the extra sum of squares due to each factor. The extra sum of squares calculated for each of the factors may either be partial or sequential. For the present example, if the extra sum of squares used is sequential, then the model sum of squares can be written as:

where
represents the model sum of squares,
represents the sequential sum of squares due to factor
,
represents the sequential sum of squares due to factor and
represents the sequential sum of squares due to the interaction
.
The mean squares are obtained by dividing the sum of squares by the associated degrees of freedom. Once the mean squares are known the test statistics can be calculated. For example, the test statistic to test the significance of factor
(or the hypothesis
) can then be obtained as:

Similarly the test statistic to test significance of factor
and the interaction
can be respectively obtained as:

It is recommended to conduct the test for interactions before conducting the test for the main effects. This is because, if an interaction is present, then the main effect of the factor depends on the level of the other factors and looking at the main effect is of little value. However, if the interaction is absent then the main effects become important.
Example
Consider an experiment to investigate the effect of speed and type of fuel additive used on the mileage of a sports utility vehicle. Three speeds and two types of fuel additives are investigated. Each of the treatment combinations are replicated three times. The mileage values observed are displayed in the fifth table.
Mileage data for different speeds and fuel additive types.
The experimental design for the data in the fifth table is shown in the figure below. In the figure, the factor Speed is represented as factor
and the factor Fuel Additive is represented as factor
. The experimenter would like to investigate if speed, fuel additive or the interaction between speed and fuel additive affects the mileage of the sports utility vehicle. In other words, the following hypotheses need to be tested:

The test statistics for the three tests are:
- 1.

- where
is the mean square for factor
and
is the error mean square
- 2.
- where
is the mean square for factor
and
is the error mean square
- 3.
- where
is the mean square for interaction
and
is the error mean square
Experimental design for the data in the fifth table.
The ANOVA model for this experiment can be written as:

where
represents the
th treatment of factor
(speed) with
=1, 2, 3;
represents the
th treatment of factor
(fuel additive) with
=1, 2; and
represents the interaction effect. In order to calculate the test statistics, it is convenient to express the ANOVA model of the equation given above in the form
. This can be done as explained next.
Expression of the ANOVA Model as 
Since the effects
,
and
represent deviations from the overall mean, the following constraints exist.
Constraints on
are:

Therefore, only two of the
effects are independent. Assuming that
and
are independent,
. (The null hypothesis to test the significance of factor
can be rewritten using only the independent effects as
.) DOE++ displays only the independent effects because only these effects are important to the analysis. The independent effects,
and
, are displayed as A[1] and A[2] respectively because these are the effects associated with factor
(speed).
Constraints on
are:

Therefore, only one of the
effects are independent. Assuming that
is independent,
. (The null hypothesis to test the significance of factor
can be rewritten using only the independent effect as
.) The independent effect
is displayed as B:B in DOE++.
Constraints on
are:

The last five equations given above represent four constraints, as only four of these five equations are independent. Therefore, only two out of the six
effects are independent. Assuming that
and
are independent, the other four effects can be expressed in terms of these effects. (The null hypothesis to test the significance of interaction
can be rewritten using only the independent effects as
.) The effects
and
are displayed as A[1]B and A[2]B respectively in DOE++.
The regression version of the ANOVA model can be obtained using indicator variables, similar to the case of the single factor experiment in Fitting ANOVA Models. Since factor
has three levels, two indicator variables,
and
, are required which need to be coded as shown next:

Factor
has two levels and can be represented using one indicator variable,
, as follows:

The
interaction will be represented by all possible terms resulting from the product of the indicator variables representing factors
and
. There are two such terms here -
and
. The regression version of the ANOVA model can finally be obtained as:

In matrix notation this model can be expressed as:

- where:
![{\displaystyle y=\left[{\begin{matrix}{{Y}_{111}}\\{{Y}_{211}}\\{{Y}_{311}}\\{{Y}_{121}}\\{{Y}_{221}}\\{{Y}_{321}}\\{{Y}_{112}}\\{{Y}_{212}}\\.\\.\\{{Y}_{323}}\\\end{matrix}}\right]=X\beta +\epsilon =\left[{\begin{matrix}1&1&0&1&1&0\\1&0&1&1&0&1\\1&-1&-1&1&-1&-1\\1&1&0&-1&-1&0\\1&0&1&-1&0&-1\\1&-1&-1&-1&1&1\\1&1&0&1&1&0\\1&0&1&1&0&1\\.&.&.&.&.&.\\.&.&.&.&.&.\\1&-1&-1&-1&1&1\\\end{matrix}}\right]\left[{\begin{matrix}\mu \\{{\tau }_{1}}\\{{\tau }_{2}}\\{{\delta }_{1}}\\{{(\tau \delta )}_{11}}\\{{(\tau \delta )}_{21}}\\\end{matrix}}\right]+\left[{\begin{matrix}{{\epsilon }_{111}}\\{{\epsilon }_{211}}\\{{\epsilon }_{311}}\\{{\epsilon }_{121}}\\{{\epsilon }_{221}}\\{{\epsilon }_{321}}\\{{\epsilon }_{112}}\\{{\epsilon }_{212}}\\.\\.\\{{\epsilon }_{323}}\\\end{matrix}}\right]\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/bfdd414ef1445f9149eaf38fe30bdb25bbb8d6ba)
The vector
can be substituted with the response values from the fifth table to get:
![{\displaystyle y=\left[{\begin{matrix}{{Y}_{111}}\\{{Y}_{211}}\\{{Y}_{311}}\\{{Y}_{121}}\\{{Y}_{221}}\\{{Y}_{321}}\\{{Y}_{112}}\\{{Y}_{212}}\\.\\.\\{{Y}_{323}}\\\end{matrix}}\right]=\left[{\begin{matrix}17.3\\18.9\\17.1\\18.7\\19.1\\18.8\\17.8\\18.2\\.\\.\\18.3\\\end{matrix}}\right]\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/f0fdc409365eea4b8f6688b3f7f2ed74f071a14a)
Knowing
,
and
, the sum of squares for the ANOVA model and the extra sum of squares for each of the factors can be calculated. These are used to calculate the mean squares that are used to obtain the test statistics.
Calculation of Sum of Squares for the Model
The model sum of squares,
, for the regression version of the ANOVA model can be obtained as:
![{\displaystyle {\begin{aligned}S{{S}_{TR}}=&{{y}^{\prime }}[H-({\frac {1}{{{n}_{a}}\cdot {{n}_{b}}\cdot m}})J]y\\=&{{y}^{\prime }}[H-({\frac {1}{18}})J]y\\=&9.7311\end{aligned}}\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/3a9b7f0f7f0a7562f24cf7cd83ec4cd69e9f8b9b)
where
is the hat matrix and
is the matrix of ones. Since five effect terms (
,
,
,
and
) are used in the model, the number of degrees of freedom associated with
is five (
).
The total sum of squares,
, can be calculated as:
![{\displaystyle {\begin{aligned}S{{S}_{T}}=&{{y}^{\prime }}[I-({\frac {1}{{{n}_{a}}\cdot {{n}_{b}}\cdot m}})J]y\\=&{{y}^{\prime }}[I-({\frac {1}{18}})J]y\\=&10.7178\end{aligned}}\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/a19aaf3fd4973aa382b0fab806eaf4c0b26a1caf)
Since there are 18 observed response values, the number of degrees of freedom associated with the total sum of squares is 17 (
). The error sum of squares can now be obtained:

Since there are three replicates of the full factorial experiment, all of the error sum of squares is pure error. (This can also be seen from the preceding figure, where each treatment combination of the full factorial design is repeated three times.) The number of degrees of freedom associated with the error sum of squares is:

The sequential sum of squares for factor
can be calculated as:
![{\displaystyle {\begin{aligned}S{{S}_{A}}=&S{{S}_{TR}}(\mu ,{{\tau }_{1}},{{\tau }_{2}})-S{{S}_{TR}}(\mu )\\=&{{y}^{\prime }}[{{H}_{\mu ,{{\tau }_{1}},{{\tau }_{2}}}}-({\frac {1}{18}})J]y-0\end{aligned}}\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/d4aabb80558e6abc5bd450f395966d56a3f5aa47)
where
and
is the matrix containing only the first three columns of the
matrix. Thus:
![{\displaystyle {\begin{aligned}S{{S}_{A}}=&{{y}^{\prime }}[{{H}_{\mu ,{{\tau }_{1}},{{\tau }_{2}}}}-({\frac {1}{18}})J]y-0\\=&4.5811-0\\=&4.5811\end{aligned}}\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/fbd3014493dc4a44bb3e02a55d2417041cba76e5)
Since there are two independent effects (
,
) for factor
, the degrees of freedom associated with
are two (
).
Similarly, the sum of squares for factor
can be calculated as:
![{\displaystyle {\begin{aligned}S{{S}_{B}}=&S{{S}_{TR}}(\mu ,{{\tau }_{1}},{{\tau }_{2}},{{\delta }_{1}})-S{{S}_{TR}}(\mu ,{{\tau }_{1}},{{\tau }_{2}})\\=&{{y}^{\prime }}[{{H}_{\mu ,{{\tau }_{1}},{{\tau }_{2}},{{\delta }_{1}}}}-({\frac {1}{18}})J]y-{{y}^{\prime }}[{{H}_{\mu ,{{\tau }_{1}},{{\tau }_{2}}}}-({\frac {1}{18}})J]y\\=&9.4900-4.5811\\=&4.9089\end{aligned}}\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/87d5af2d6d02333b6ad1a6d040bd43ba8fa5e5a7)
Since there is one independent effect,
, for factor
, the number of degrees of freedom associated with
is one (
).
The sum of squares for the interaction
is:

Since there are two independent interaction effects,
and
, the number of degrees of freedom associated with
is two (
).
Calculation of the Test Statistics
Knowing the sum of squares, the test statistic for each of the factors can be calculated. Analyzing the interaction first, the test statistic for interaction
is:

The
value corresponding to this statistic, based on the
distribution with 2 degrees of freedom in the numerator and 12 degrees of freedom in the denominator, is:

Assuming that the desired significance level is 0.1, since
value > 0.1, we fail to reject
and conclude that the interaction between speed and fuel additive does not significantly affect the mileage of the sports utility vehicle. DOE++ displays this result in the ANOVA table, as shown in the following figure. In the absence of the interaction, the analysis of main effects becomes important.
The test statistic for factor
is:

The
value corresponding to this statistic based on the
distribution with 2 degrees of freedom in the numerator and 12 degrees of freedom in the denominator is:

Since
value < 0.1,
is rejected and it is concluded that factor
(or speed) has a significant effect on the mileage.
The test statistic for factor
is:

The
value corresponding to this statistic based on the
distribution with 2 degrees of freedom in the numerator and 12 degrees of freedom in the denominator is:

Since
value < 0.1,
is rejected and it is concluded that factor
(or fuel additive type) has a significant effect on the mileage.
Therefore, it can be concluded that speed and fuel additive type affect the mileage of the vehicle significantly. The results are displayed in the ANOVA table of the following figure.
Analysis results for the experiment in the fifth table.
Calculation of Effect Coefficients
Results for the effect coefficients of the model of the regression version of the ANOVA model are displayed in the Regression Information table in the following figure. Calculations of the results in this table are discussed next. The effect coefficients can be calculated as follows:
![{\displaystyle {\begin{aligned}{\hat {\beta }}=&{{({{X}^{\prime }}X)}^{-1}}{{X}^{\prime }}y\\=&\left[{\begin{matrix}18.2889\\-0.2056\\0.6944\\-0.5222\\0.0056\\0.1389\\\end{matrix}}\right]\end{aligned}}\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/bb473fe97af21419c0ae8996248f2727dfc5297a)
Therefore,
,
,
etc. As mentioned previously, these coefficients are displayed as Intercept, A[1] and A[2] respectively depending on the name of the factor used in the experimental design. The standard error for each of these estimates is obtained using the diagonal elements of the variance-covariance matrix
.
![{\displaystyle {\begin{aligned}C=&{{\hat {\sigma }}^{2}}{{({{X}^{\prime }}X)}^{-1}}\\=&M{{S}_{E}}\cdot {{({{X}^{\prime }}X)}^{-1}}\\=&\left[{\begin{matrix}0.0046&0&0&0&0&0\\0&0.0091&-0.0046&0&0&0\\0&-0.0046&0.0091&0&0&0\\0&0&0&0.0046&0&0\\0&0&0&0&0.0091&-0.0046\\0&0&0&0&-0.0046&0.0091\\\end{matrix}}\right]\end{aligned}}\,\!}](https://en.wikipedia.org/api/rest_v1/media/math/render/svg/e4fb3451dbea2c26e16ec8774484b0a3329e193e)
For example, the standard error for
is:

Then the
statistic for