LAB 7: Difference of Means Tests

 

In this lab, we will examine Difference of Means tests for:

  1. one sample
  2. two independent samples
  3. paired samples .

A. One Sample Difference of Means Test

Use

This test compares a sample mean against a known or hypothesized value for the population mean. If you have information about the true population mean, you can use this test to determine if your sample is drawn from the same underlying population. This will assist you in determining whether your sample is representative of the population.

How does it work?

This test calculates the difference between the sample mean () and the population mean () and divides it by the standard error (s/). The calculated value of the test statistic tells us how far the sample mean lies away from the population mean.

The difference of means tests allow us to investigate non-directional and directional hypotheses.

For this test, we can establish the following statistical hypotheses:

The hypotheses as presented above are non-directional because we are interested in whether the sample mean is different from the known value (or population mean), not whether the sample mean is less than or greater than the population mean.

We can establish a directional hypothesis to determine whether the sample mean is greater than or less than the population mean. We would state our hypotheses as follows (shown in text and symbol notation):

If the test statistic is a large number, we know there is a large difference between the sample mean and the known value. In this case, we would reject the null hypothesis and infer a significant difference between and . If the test statistic is a small number, we will not reject the null hypothesis and can infer that there is no significant difference between and .

Assumptions

Probability Distribution

This difference of means test use the t probability distribution to model the test statistic.

Test Statistic

Formula: where:
is the sample mean
is the known or hypothesized population mean
s is the sample standard deviation
n is the sample size

Critical Values

When we use non-directional hypotheses, the probability is allocated to both sides (tails) of the t-probability distribution. This is because we are testing only for a difference between and . The diagram below shows the rejection region associated with non-directional hypotheses:

If we established directional hypotheses, and the rejection region are allocated to one side (tail) of the probability distribution (Case 1 or Case 2 in the diagram below).

Therefore, the critical values for this test are based on /2 for non-directional hypotheses (two tailed test) or for directional hypotheses (one-tailed test) and the number of degrees of freedom ( = n-1).

Decision rule

For non-directional hypotheses, reject if t* > + OR if t* < -
 
For directional (upper tail) hypotheses, reject if t* > +
 
For directional (lower tail) hypotheses, reject if t* < -

Example

The average length of adult Sockeye salmon is 60 cm (the population distribution is known to be normal). A sample (n=15) is taken for salmon stock assessment where = 49 cm and s = 18 cm. Is this sample representative of the known sockeye population?

1. Select the appropriate test:
You want to compare your sample mean to the known population mean - use one sample Difference of Means test.

2. Check assumptions:

3. State your hypotheses:

4. Select significance level:
We will use the standard = 0.05 (95% confidence level)

5. Select probability distribution:
We will use the t probability distribution.

6. Establish your critical values:
We have non-directional hypotheses so we are conducting a two-tailed test. The t-table accounts for the fact that the is divided on both sides of the distribution. Therefore, we look up the full probability at 14 degrees of freedom ( = 15-1 = 14) on the t-table. We find that = 2.15.

7. Calculate test statistic:


(49-60)/(18 /(sq rt 15)) = - 2.37

 

8. Compare using the decision rule:

Rule for non-directional hypotheses:
reject if t* > + OR if t* < -
 
t* (-2.37) is less than - (-2.15), so we reject
 

9. State conclusion:
We conclude with 95% confidence that our sample mean is significantly different from the population mean. Therefore, we infer that our sample is not representative of the population.
 

B. Two Sample Difference of Means Test
(for independent samples)

Use

This test is used to compare two sample means to determine if a significant difference exists between two independent samples (i.e. do these samples have the same or different underlying populations?)

How does it work?

This test calculates the difference between two sample means. This then is divided by an estimate of the standard error.

For this test, our hypotheses are:

You can also establish directional hypotheses as follows:

Assumptions

Probability Distribution

We use the t probability distribution to model the test statistic.

Test Statistic

Formula: where:
is the means from each sample
is an estimate of standard error of the difference of means

There are two ways to calculate the term based on whether the variances are equal or not. Use the F test (described under Variance Check) to assess variance equality.

1. If the variances are equal, we combine the two sample standard deviations to create a pooled ('combined') variance estimate. This pooled estimate is developed by weighting the variance of each sample by the sample size.

The formula for the pooled estimate is:

where:
s2 is the variance from each sample
n is the sample size from each sample

Use this pooled estimate to develop :

where:
Sp is the pooled variance estimate
n is the sample size from each sample

2. If the variances are unequal, you use a separate variance estimate for :

where:
s2 is the variance from each sample
n is the sample size from each sample

Note: these formulas are given so that you can better understand how SPSS is calculating the test statistic.

Critical Value

Critical t values are based on or /2 and the number of degrees of freedom ( = n1 + n2 - 2)

Decision Rule

For non-directional hypotheses, reject if t* > + OR if t* < -
 
For directional (upper tail) hypotheses, reject if t* > +
 
For directional (lower tail) hypotheses, reject if t* < -
 

Variance Check

To check the equality of the variances, you must use another hypothesis test called the F test (based on the F probability distribution). This test compares the ratio of the sample variances (the variance of sample 1 divided by the variance of sample 2). If the variances are equal ('similar'), the ratio will be close to 1.

Example

Samples of 1 year old salmon are taken from two local rivers to measure their growth (using length). Sample 1 (Goldstream River) has = 7.3 cm, s = 1.2 cm, n=25. Sample 2 (Chemainus River) has = 6.9 cm, s = 1.0 cm, n = 20. Do the young salmon in each river show similar growth in their first year?

1. Check appropriate test: you want to compare two independent sample means - use two sample DoM test.

2. Check assumptions:

3. State your hypotheses:

4. Select significance level:
We will use the standard = 0.05 (95% confidence level)

5. Select probability distribution:
The population variances are unknown; we use the t distribution

6. Calculate appropriate test statistic:
Our F test showed that the variances can be considered equal, so calculate t* using a pooled variance estimate. SPSS uses a three step calculation:

7. Identify the critical values:
At /2 = 0.025 and degrees of freedom = 43, = 2.02

8. Compare using the decision rule:

Rule: reject if t* < - or t* > +
 
t* (1.19) is less than (2.02),
so we cannot reject .

9. State conclusion:

We conclude with 95% confidence that our sample means are not different. Therefore, we infer that the young salmon in the Goldstream and Chemainus Rivers have similar growth in their first year.
 

C. Paired Difference of Means Test

Use

This test compares two sample means to determine if a significant difference exists between two paired samples. For example if the observations are taken before and after an experiment, event or treatment.

How does it work?

For this test, our hypotheses are:

You can establish directional alternate hypotheses as follows:

Note: the known or hypothesized value could be 0 or a specific value related to the research question.

Assumptions

Probability Distribution

We use the t probability distribution to model the test statistic.

Test Statistic

This test uses the formula for one sample difference of means test but it is based on the mean and standard deviation of the differences between the two samples.

Formula: where:
d is the mean of the differences
sd is the standard deviation of the differences
n is the number of paired observations
V is the hypothesized or known value

The formula for the mean of differences is:

where:
is the difference between each set of paired observations
n is the number of paired observations

The formula for the standard deviation of differences is:

where:
()2 is the squared difference between each set of paired observations
n is the number of paired observations

Note: these formulas are given so that you understand how SPSS is calculating the test statistic.

Critical Value

Critical t values are based on the significance level or /2 and the number of degrees of freedom ( = n-1)

Decision rule

For non-directional hypotheses, reject if t* > + OR if t* < -
 
For directional (upper tail) hypotheses, reject if t* > +
 
For directional (lower tail) hypotheses, reject if t* < -

Example

On a salmon farm, 10 salmon are monitored to test the effects of a growth hormone. In a given year, we would expect that these salmon would gain 0.75 kg. Did the hormone treatment increase the average weight of the salmon above the natural weight growth? The data are listed below.

Salmon
ID
Sample 1Sample 2
Weight before hormone (kg) Weight after hormone (kg)
1 4.5 5.6
2 5.0 5.8
3 4.8 5.8
4 5.2 5.7
5 4.8 7.2
6 5.8 7.3
7 4.6 6.0
8 4.9 6.9
9 4.7 6.6
10 5.1 6.9

1. Select appropriate test:
You want to compare two samples with paired observations (the samples are not independent) - use two sample paired difference of means test.

2. Check assumptions:

3. State your hypotheses: Our research question: is the difference in mean weight between Sample 2 (after hormone) and Sample 1 (before hormone) greater than 0.75 kg?

To answer this question, we must use directional hypotheses. Therefore, we state:

4. Select significance level:
We will use standard = 0.05 (95% confidence level)

5. Select probability distribution:
We will use the t probability distribution for this test.

6. Calculate test statistic:
This is a three step calculation.

Salmon ID Sample 1Sample 2
(Sample 2 - Sample 1)
()2 Calculations
1 4.5 5.6 1.1 1.21 Step 1: calculate d
 
2 5.0 5.8 0.8 0.64
3 4.8 5.8 1.0 1.00
4 5.2 5.7 0.5 0.25
5 4.8 7.2 2.4 5.76
6 5.8 7.3 1.5 2.25 Step 2: calculate sd
 

7 4.6 6.0 1.4 1.96
8 4.9 6.9 2.0 4.00
9 4.7 6.6 1.9 3.61
10 5.1 6.9 1.8 3.24
14.4 23.9

Step 3: calculate t*

7. Establish your critical values:
This is a one-tailed test because we are using directional hypotheses. Therefore, at = 0.05 and (n-1) = 9, = +1.83

8. Compare using the decision rule:

Rule for upper tail hypotheses: Reject if t* > +
 
t* (3.70) is greater than (1.83), so we reject .

9. State conclusion: We conclude with 95% confidence that the difference between the two samples is significantly greater than 0.75 kg. Therefore, we infer that the application of a growth hormone appears to have significantly increased the weight of the farmed salmon (by more than the natural rate of growth).