LAB 6: Non-parametric Tests

In this lab, we will examine:

Recommended web links:
Hyperstat.com: A diagram of the Chi-square distribution
VassarStats: An applet showing how the Chi-square distribution changes with different degrees of freedom. Enter degrees of freedom in the dialogue box that pops up (between 1 and 20)
Examples and Applications: An overview and examples using the Chi-square distribution
 


Applied hypothesis testing...

There are two groups of inferential statistical tests: parametric and non-parametric tests. A parametric test requires that the underlying population for each sample is normally distributed, and that the data are either interval or ratio.

Non - parametric tests have been developed that use only frequency counts in the calculation of the test statistics. No assumptions are made about the shape of the underlying population distribution.
 

Introduction to Goodness of Fit Tests

'Goodness of Fit' tests compare the frequencies observed in a sample to a distribution of expected frequencies in order to infer similarity.

When we discuss the mechanics of the non-parametric Goodness of Fit tests, you will notice the word distribution used in two different contexts.

  1. Frequency distributions refer to the distribution of data values in a sample or population (refresh my memory). The tests use frequencies in different ways:
    • Chi-square() tests use simple frequency counts.
    • Kolmogorov-Smirnov tests use cumulative relative frequencies.
       
  2. Probability distributions ( and D) model the distribution of test statistics (refresh my memory). Two important probability distributions are introduced below.

The Chi-square () Distribution

  • Pronounced kai not chee
     
  • This distribution begins at 0 and is positively skewed
     
  • The distribution has no negative values.
Shape of the distribution
  • There are many distribution shapes, each associated with a different number of degrees of freedom (similar to the student’s t distribution). As the number of degrees of freedom increases, the distribution becomes less skewed
     
  • This distribution is used for the one and multi-sample Chi-square tests.

The ‘D’ Distribution



Applications and Calculations

These tests use the hypothesis testing procedures detailed in Lab 5. For each test, we will outline:

Chi-square tests

The chi-square tests are the most basic tests because they use nominal categories and simple frequency counts. They are flexible tests because they can be applied to most data. However, these tests treat all categories as nominal so information associated with ordinal, interval or ratio levels of measurement will be lost.

In this discussion, we are going to start with one and two sample chi-square tests and then examine the one and two sample Kolmogorov-Smirnov tests.

One Sample Chi-square () Test

Use

This test compares the frequencies observed in a sample to expected frequencies provided by the analyst. This test uses the sample frequency distribution to make inferences about the population frequency distribution.

How does it work?

This test focuses on how well the observed and expected frequency counts match for each nominal or ordinal category. The expected frequencies may be uniformly, randomly, or proportionally distributed (more details later in the lab). The analyst can choose an expected frequency distribution based on his/her needs.

This test uses the following statistical hypotheses:

The test statistic provides a measure of the amount of difference between the two frequency distributions. If the difference between the observed and expected distributions is small, will be small. If the difference is large, will be large.

Assumptions


 
Test Statistic

Formula: where:
Oi = observed frequency in each category
Ei = expected frequency in each category

Critical Value

Critical values () are based on the significance level () and the number of degrees of freedom (k-1, where k=number of categories).

Decision rule

Reject if the calculated value (*) is greater than critical value ().
 
If the * is greater than , it will fall in the red 'rejection region' on the chi-square probability distribution.

Example

Hydrogeologists are studying the occurrence of natural springs by rock type in Spring Valley. They count the number of springs by rock type within a similar-sized sample areas. The frequencies are summarized below.

RESEARCH QUESTION: Is the occurrence of natural springs influenced by rock type?

1. Select the appropriate test:
To investigate this research question, we need to compare our observed frequencies to a uniform frequency distribution. Why? If rock type is not important in determining the location of springs, we will expect the number of springs to be evenly distributed over rock type. If our observed frequencies are different from the expected uniform frequencies, we can infer that rock type affects the occurrence of natural springs. We will use the one sample chi-square test to answer our research question.

2. Check assumptions:

3. State your hypotheses:

4. Select significance level:
We will use the standard = 0.05 (95% confidence level)

5. Select probability distribution of test statistic:
This test uses the distribution for the test statistic

6. Establish the critical values:
At = 0.05, and degrees of freedom (k-1) = 2, the critical value () = 5.99.
(Recall that k is the number of categories; we have 3 rock types so k = 3)

7. Calculate test statistic:

8. Make inference using the decision rule:

Rule: reject if * >
 
From above, * (7.75) is greater than (5.99), so reject

9. State conclusion:
We conclude with 95% confidence that the observed frequencies are significantly different from the expected (uniform) frequency distribution (i.e the springs are not uniformly distributed between rock types). We infer that rock type influences the occurrence of natural springs in Spring Valley.

Your turn!

The research presented above was repeated in an adjacent valley. The observed frequencies are listed below. RESEARCH QUESTION: Is there a significant difference in the occurrence of springs due to rock type in this valley? (check your answer).

Rock type Number of Springs
Limestone 11
Calcareous Marl 5
Sandstone 14
Volcanics 6
Total 36

 

Two or more sample Chi-square () test

Uses

This test compares the observed frequency distributions of two or more samples to see if the samples are drawn from the same underlying population (i.e. do the samples have the same parent population or not?).

How does it work?

The observed frequencies for the two samples are laid out in a contingency table (similar to the joint probability tables in Lab 2). These observed frequencies are compared to the frequency count we would expect if the samples were drawn from the same underlying population.

This test uses the following hypotheses:

If there are large differences between the observed and expected frequencies, the * test statistic will be greater than the critical value. Therefore, we will reject and infer that the samples are drawn from different underlying populations. If the differences between the observed and expected frequencies are small, * will be small and we will not reject . In this case, we would infer that the samples are drawn from the same underlying population.

Assumptions

Probability Distribution

We use the probability distribution to model the test statistic.

Test Statistic

Formula: where:
Oi,,j = observed frequency in each cell
Ei,,j = expected frequency in each cell

Note: the double means sum all rows and columns in the table.

Critical Value

Critical values () are identified based on the significance level () and the number of degrees of freedom (k-1)*(l -1), where k = the number of categories and l = the number of samples (not sample size).

Decision rule

Reject if the calculated value (*) is greater than critical value ().

Example

A transportation planner is studying user preferences for alternatives to the Lions Gate Bridge crossing. A questionnaire is designed that specifies three transit options: widen the bridge (2 lanes each way), replace the bridge with a tunnel, designate one of the existing vehicle lanes for a rapid transit system. Three bridge user groups are surveyed: daily commuters, commercial transport operators, and tourists. Each user group is considered a different sample.

RESEARCH QUESTION: Is there a significant difference between the responses of the user groups?

 
Reponses
Samples (user groups)  
Row Totals
Daily Commuters Commercial Operators Tourists
Widen 19 22 26 67
Tunnel 8 35 14 57
Transit 33 3 20 56
Column Totals 60 60 60 180

1. Select the appropriate test:
We need to determine if there are differences between the observed frequencies of 3 samples (user groups). If the responses of the user groups are similar we can infer that the user groups are all part of the same underlying population. If there are differences in the responses, we can infer that the user groups come from different parent populations. We will use the multi-sample test

2. Check assumptions:

3. State your hypotheses:

4. Select significance level:
We will use standard = 0.05 (95% confidence level)

5. Select probability distribution of test statistic:
This test uses the distribution for the test statistic.

6. Identify the critical values:
At = 0.05, degrees of freedom (k-1)*(l -1) = (3-1)*(3-1) = 4, therefore = 9.49. (Recall that k is the number of categories and l is the number of samples)

7. Calculate test statistic:

8. Compare using the decision rule:

Rule: reject if * >
 
From above, * (46.5) is greater than (9.49), so we can reject .

9. State conclusion:
We infer with 95% confidence that there is a significant difference between the responses of the user groups in terms of their preferences of alternatives for the Lions Gate Bridge crossing.
 

Chi-square Cautions

  1. The chi-square tests will not work if the expected frequencies are ‘too small’ (see the assumptions for the one and two sample tests). Sometimes categories can be combined to ensure that expected frequencies are large enough. However, this may make no sense within the context of the data available, and you may have to abandon the test.
     
  2. Although the chi-square tests can be used with any level of measurement (data must be collapsed into categories), these tests treat all categories as nominal. If the categories are ordinal, the chi-square tests disregard this information. If your data are ordinal or higher, it may be better to use a more powerful statistical method. As the Kolmogorov Smirnov test preserves ordering, it is often a good alternative.
     
  3. As the chi-square tests use absolute frequencies, they do not differentiate between large and small samples when calculating the difference between observed and expected frequencies. A small difference between large frequencies will result in a high * test statistic and you will end up rejecting . A relatively big difference between small frequencies will give a low * and you will not reject .
     
    Example:

    Case Obs ExpDifference Comments
    A 510 500 10 These frequencies are quite large, so the difference of 10 is relatively minor.
    B 10 5 5 These frequencies are small, so the difference of 5 is very important. The observed frequency is twice as large as the expected.

    In the hypothesis test, you would probably reject in case A (and infer a significant difference) and not reject in case B (and infer no difference). Because the critical value () is based on number of categories and not sample size, the test cannot account for the relative differences in actual frequencies.
     
    Although there are limitations to the chi-square, these tests are easy to use and have many applications, particularly for analyzing survey or social data. In these types of research, the data are usually collected at the nominal scale.
     


Kolomogorov-Smirnov Tests

The Kolmogorov-Smirnov (KS) tests also compare observed and expected frequencies. However, these tests are different from the chi-square tests because the KS tests use cumulative relative frequencies in their test statistic. To develop cumulative relative frequencies, the data must be at the ordinal scale. These tests are considered more powerful that the chi-square tests because they use a higher level of measurement. However, the ordinal scale restriction means that the tests cannot be applied as widely as the chi-square tests.

One Sample KS Test

Uses

The one sample KS test is often used as a diagnostic tool to assess whether the underlying population follows a normal distribution. As you know, this information is required before we can apply parametric tests. You can also use this test for determining if the observed frequencies follow a uniform or Poisson distribution (the steps for calculating Poisson frequencies are detailed at the end of the lab).

How does it work?

For this test, the observed and expected frequencies are converted into cumulative relative frequencies (the steps for converting absolute frequencies to cumulative frequencies are shown in the example).

Three ways to display frequencies

Comparing 3 expected distributions

This test compares the two cumulative relative frequencies (CRF) to find the maximum difference between the observed and expected frequencies. The maximum difference is the D* test statistic.

Observed CRF vs 3 Expected CRF
Note: for this example, the observed frequencies have an approximately normal shape with a slight negative skew. Therefore, there is only a small difference between the observed and expected (normal) frequencies.

This test uses the following hypotheses:

If the two cumulative relative frequencies are similar, the maximum difference (D) will be small and we cannot reject . We infer that the observed distribution follows the expected distribution. If, however, there is a large difference between the two cumulative relative frequencies (D is large), we reject .

Assumptions

Probability Distribution

The test statistic D* follows the D probability distribution.

Test Statistic

Formula:
this means "D* is the maximum (or largest) absolute (without + or - sign) difference between the observed cumulative relative frequencies () and the expected cumulative relative frequencies ()"

Critical Value

Critical D values based on significance level () and the number of degrees of freedom (n, where n=sample size). If the sample size is larger than 100, use the formulas at the bottom of the critical D table to calculate .

Decision rule

Reject if D* is greater than value

Example

A demographer is collecting data on the size of families (number of people per household) in the southern Indian state of Kerala. After visiting 70 households, she sketches a histogram and finds that her data are somewhat negatively skewed. She would like to compare her household size sample mean to the national average using a one sample difference of means test (a parametric test). However, she is concerned that the data may be too skewed to satisfy the test assumption of normality. The data are summarized in the categories below.

QUESTION: Are her data normally distributed?

Persons per household Observed Frequency
1 to 2 3
3 to 4 5
5 to 6 6
7 to 8 9
9 to10 15
11 to12 23
13 to14 9
Total 70


 

Note: For this problem, the expected frequencies based on the normal distribution are printed for you, but see note below on computing expected normal distribution.

1. Select appropriate test:
We will use the one sample KS test because it will allow us to compare the observed frequencies against the frequencies expected for a normal distribution. This test also allows us to retain the ordinal nature of the data (a chi-square test would convert the ordinal categories to nominal categories, resulting in a loss of information).

2. Check assumptions:

3. State your hypotheses:

4. Select the significance level:
We will use the standard = 0.05 (95% confidence level)

5. Establish the probability distribution of the test statistic:
Because we are conducting a KS test, we use the D distribution for the test statistic.

6. Establish the critical values:
At = 0.05, degrees of freedom (n) = 70, = 0.16

7. Calculate the test statistic:

8. Make inference using the decision rule:

9. State conclusion:
We conclude with 95% confidence the observed frequencies are significantly different from the expected (normal) frequency distribution. Therefore, we infer that the underlying population is not normally distributed and the researcher cannot use parametric tests to analyse these data.
 

Two sample KS test

Use

This test compares the observed frequencies of two samples. It is used to determine if two samples are drawn from the same underlying population.

How does it work?

This tests works in the same way as the one sample KS test. Both sets of observed frequencies are converted into cumulative relative frequencies. The test compares the two cumulative relative frequencies to find the maximum difference between the frequencies. The hypotheses for this test are:

Rejecting follows the same logic given in the one-sample KS test.

Assumptions

Probability Distribution

The test statistic D* follows the D probability distribution.

Test Statistic

Formula:
"D* is the maximum absolute difference between the CRF in sample A and the CRF in sample B"

Critical Value

Critical D values are calculated using the following formulas for each significance level

Significance Level () Critical D formula
0.10 1.22 *
0.05 1.36 *
0.025 1.48 *
0.01 1.63 *
0.005 1.73 *
0.001 1.95 *

Decision rule

Reject if calculated D is greater than critical D value

Practice

Work through this practice question:

A regional planner is reviewing the proposed expansion of recreational facilities for two communities. The age structure of each community will influence the type of new or expanded facilities (i.e. daycare and children’s programs or seniors’ activity center). In the past, the two communities had similar demographics. However, recent migration trends may have changed the demographic patterns. Random sampling was conducted in each community; the data are presented below.

Age classes Port Francis Pebble Beach
0-10 59 31
11-18 53 37
19-44 48 45
45-64 32 57
65+ 18 35
Total 210 205

 

RESEARCH QUESTION: Is there a significant difference between the age classes of the two communities? Check your answer.
 

Expected Frequencies

The one sample Goodness of Fit test is suitable for use with uniform, normal or Poisson expected frequencies. These frequencies follow the theoretical (‘perfect’) distribution expected for a given sample size. The calculation of these frequencies is described below:

Example: The number of lightning strikes per day were recorded in Alberta for 6 summers (for months of July and August). The data are presented below.

Strikes per day Days
0 209
1 115
2 32
3 8
4 1
Total 365

 
Can you use Poisson to model these data? Check the conditions:
  1. A lightening strike on July 26 is independent of a strike on Aug 13 condition satisfied
  2. The probability of a single strike in a day is low; the probability of multiple strikes on the same day is very low condition satisfied.
  3. The daily time period is small compared to the 6 year observation period condition satisfied.
 
Step 1: Calculate (the mean number of events). Step 2: Calculate the expected frequencies using the Poisson formula:

For a one sample KS test, the frequencies we calculated above would be used to develop the expected cumulative relative frequencies. The table below shows and for the KS test.

We can see that the cumulative relative frequencies of lightning strikes observed in Alberta are very similar to the frequencies expected in a Poisson distribution. A significance test would confirm that the observed frequencies are not different from the Poisson distribution. Conduct the test and prove it for yourself (check your answer).


Answers to Practice Questions

To save space, these answers show only the critical values, test statistic, and decision. Ensure that you understand how to complete the other components of the hypothesis test (i.e. test selection, assumptions, hypotheses, significance level, probability distribution and conclusion). You will need to show all 9 steps in your lab answers.

One sample test

Critical values:
At = 0.05, and degrees of freedom (k-1) = 3, = 7.82

Test statistic:
* is shown in red.

Rock type Observed Expected (Obs-Exp)2/Exp
Limestone 11 9 0.44
Calcareous Marl 5 9 1.78
Sandstone 14 9 2.78
Volcanics 6 9 1.00
Total 36   6.00

Decision:
Rule: reject if * >

* (6.00) is less than (7.82), so we cannot reject (we conclude with 95% confidence that the sample is drawn from a population that is significantly different from uniform).
 

Two sample KS test

Critical values:
At = 0.05,

Test statistic:
D* is shown in red.

Age Class Port Francis CFA CRFA Pebble Beach CFB CRFB D
0-10 59 59 0.28 31 31 0.15 0.13
11-18 53 112 0.53 37 68 0.33 0.20
19-44 48 160 0.76 45 113 0.55 0.21
45-64 32 192 0.91 57 170 0.83 0.08
65+ 18 210 1.00 35 205 1.00 0.00
Total 210   205  

Decision:
Rule: reject if D* >
From above, D* (0.21) is greater than (0.13), so we can reject (we conclude with 95% confidence that the two samples are drawn from significantly different underyling populations).

One sample KS test with Poisson frequencies

Critical value:
At = 0.05 and degrees of freedom (n = 365), we use the formulas at the bottom of the critical D table to compute the critical value. = 0.071.

Test statistic:
From the lightening strike example, we found that D* was 0.006.

Decision:
Rule: reject if D* >
From above, D* (0.006) is less than (0.071), so we cannot reject (we conclude with 95% confidence that the sample is drawn from a population that is not different from expected - the observed distribution follows the Poisson distribution)



© University of Victoria 2000-2001     Geography 226 - Lab 6
Developed by S. Adams and M. Flaherty     Updated: September 30, 2001

New files:
Shrimp2.dat Forest2.dat