Geography 226

Lab 6

Non-Parametric and Goodness of Fit

If you follow reason far enough,

it always leads to conclusions that are contrary to reason

- Samuel Butler -

1. Airline safety analysts are reviewing the frequency of crashes over 34 years. Airline crashes can be considered rare events in time as we never know when they will occur. The analysts want to use the Poisson distribution to model the frequency of crashes for scenario planning.

Crashes per year

Frequency (years)

0

22

1

9

2

2

3

1

Total

34

  1. Is the Poisson distribution appropriate for modeling the number of airline crashes over time? Explain your rationale.
     
  2. Assess whether the observed frequencies in the table above follow an expected Poisson frequency distribution using a Kolmogorov-Smirnov test.
     
    Note: calculate the Poisson frequencies and perform the KS test by hand.
    You will not be able to include p-values in your analysis. Refer to the
    lab manual for assistance with Poisson frequencies.

 

2. Shrimp farming systems in Thailand are classified based on the stocking density of the culture ponds (animals per square metre). The following table records the stocking density for a random sample of 140 farms in the province of Samut Sakhon. Note: Very high stocking densities can lead to many environmental problems.

Density

Description

Frequency

Extensive

up to 4 animals/m2

43

Semi-intensive

5-24 animals/m2

59

Intensive

25 to 59 animals/m2

31

Very intensive

more than 60 animals/m2

7

  1. What level of measurement are these data?
     
  2. Conduct a chi-square test to determine if the observed frequencies are uniformly distributed.
     
    Calculate the test statistic by hand.

 

3. The UNDP has hired you to perform some analyses on its Human development database (HUMAN_DEV.SAV). Your analyses will focus on the life expectancy, literacy rate and GDP variables for middle and less developed (MD and LD) countries.

Note: Before you start this question, create a working copy of the Human_dev.sav file and delete the highly developed (HD) countries so that your file contains only the data you need. (You should have 110 cases in your working file.)

  1. Prepare a histogram (using SPSS) for each variable and superimpose the normal curve. For each histogram, describe the shape of the distribution and any patterns in the data (i.e. what does the histogram show about LD and MD countries).
     
  2. Name the appropriate Goodness of Fit test for assessing which group of tests could be used for the three variables. Use SPSS to conduct this test for each variable (life expectancy, literacy rate and GDP). What do you infer?
     
    Note: A table would be a good way to summarize the information that is identical for each test. Include the p-values calculated by SPSS in your analysis.

 

  1. The table below refers to 68,694 passengers in autos and light trucks involved in accidents in the state of Maine in 1991. The table classifies passengers by whether they were wearing a seatbelt and by whether they were injured or killed. Are the injuries that have been sustained independent of seatbelt use? Solve by hand.

Seatbelt

Injury

Use

Yes

No

Yes

2,409

35,383

No

3,865

27,037


 

 

 

Marking Guide (Total = 30)

Question

Mark

Q1

a

3

b

7

Q2

a

1

b

6

Q3

a

3

b

6

Q4

 

4