Lab 0: Getting to Know SPSS and The SPSS Help Manual

Click the links below for SPSS Help

I. Basic Operations	II. Data Operations	III. Statistical Tools
Starting SPSS SPSS Layout Open existing file Open new file Save data Save results	Insert a variable Enter data Insert a record Sort data Create calculated variable Create charts Edit charts	Descriptive statistics Random Sampling Non-parametric tests Parametric tests Correlation Regression

If you have detailed questions about any operation or test, check the HELP function in SPSS. The Help Manual assumes that you know how to work in a Windows environment. If you need assistance, please ask your TA.

Go to Lab 0: SPSS Practice Worksheet

I. BASIC OPERATIONS

Starting SPSS

Go to the Start button (lower left corner of screen) to open the Start menu.
Move your mouse up to Programs to expand the list of software.
Navigate to the SPSS icon and click to start.

SPSS Layout

SPSS has three main features:

Data Editor: displays your dataset. You can add, sort, remove variables or records, and compute new variables. The Data Editor has two tabs within the main window (at the bottom of the screen):

Data View

Variable View

In a dataset, each column is a variable; each row is a record (or observation or case).

	Variable 1	Variable 2	. . .	Variable X
Record 1	data	data	data	data
Record 2	data	data	data	data
. . .	. . .	. . .	. . .	. . .
Record X	data	data	data	data

Output: displays the results of tests or charts. You can add/delete text or numbers and edit charts. The Output window has a navigation bar at the side (similar to Windows Explorer) so you can move around within your results. You can minimize, expand or delete test results.
Main Menu: The SPSS functions are accessed from the main menu at the top of the SPSS window. Click on the menu category and scroll down to the item using your mouse.

Open an existing data file

Under File, select Open Data.
Navigate through the directories to the data file.
Click OK.

Open a new file

Under File, select New Data; a blank Data Editor window will open.

Note: You can only have one dataset open at a time. If you open another dataset or a new data window, SPSS will ask you to save your old data before closing the file.

Saving data

From the Data Editor, click Save and navigate to the appropriate location (i.e. your disk).
To work on the data in another application, click Save As and specify dBase IV (.dbf) as the file type. You can import .dbf files into MS Excel.

Saving results

From the Output window, click Save and navigate to the appropriate location. Your file will be saved as SPSS output (.spo) which requires SPSS to run.
You can also copy and paste your results and charts into MS Word for later editing.

II. DATA OPERATIONS

Insert a variable

In the Data Editor, under Data, select Insert Variable. A generic variable name (VAR00001) will appear at the top of a column.
Put your cursor in the first cell of this column.
Click to Variable View tab to define the variable.
Specify:
- Name: use 8 characters or less
- Type: numeric, date, string (text)
- Width: maximum number of characters in this variable (default is usually 8)
- Decimal places: maximum decimal places required (the default is 2 decimal places).

Enter data

Use Enter to move down the column.
Use Tab to move along the row.

Insert a record

Put your cursor on the row below where you want the new case.
Under Data, select Insert Case.

Sort data

Under Data, select Sort.
Choose the variable and sort order:

Create a calculated variable

Insert a new variable
Specify the appropriate name and format for the new variable.
Under Transform, select Compute.
Type the name of the new variable in the top left box.
In the large box, create your formula. You can move variable names from the list at the lower left and use function buttons or function list under the formula box.
Click OK.
SPSS will ask to overwrite the newly created variable - click OK.

Create charts

SPSS has a range of graph and chart options. Your chart will be displayed in the Output window.

Bar charts

Choose the appropriate bar chart type (simple, clustered or stacked).
Specify that data in chart are Summaries for groups of cases.
Specify what the bars represent (number of cases, cumulative cases, other functions, etc...).
Under Category Axis, select the variable you want to graph.

Pie charts

Specify that data in chart are Summaries of groups of cases and click Define.
Specify what the slices represent (number of cases, cumulative cases, other functions, etc...).
Under Define Slices by, select the variable you want to graph.

Scatter plots

Choose Simple scatterplot.
Move your variables to the X and Y axis.
Under Label Cases, select the variable you want to label cases by on the graph. See steps under Chart Editor for more instructions.

Histogram

Select your variable.
Click check box (lower left) to display the normal curve over the histogram.

Note: you cannot specify the number of intervals in the histogram.

Edit charts

From the Output window, double click on your chart to enter the Chart Edit mode. Some important functions are listed below. Some of the options vary depending on the chart type.

Format axis - for changing the value range or increments on each axis

Under Chart, select Axis.
Select the appropriate axis (X or category, Y or scale).
Specify the data range to be shown if necessary.
Specify the division markers and increments.

Add data labels - for labelling points on scatterplots

Under Chart, select Options.
For scatter plots, click Case Labels ON to label each point by the variable specified for scatter plots.

Add reference lines - for adding reference lines on scatterplots, residual plots or bar charts

Under Chart, select Reference Lines.
Select the appropriate axis (X or Y).
Type -2 in the upper box and click ADD to move the number down into the second box.
Type 0 and click ADD again.
Repeat as needed.

To return to the Output window, click the close button in the top right corner of the Chart Editor.

III. USING THE STATISTICS TOOLS

SPSS can perform many different statistical procedures. The test results are displayed in the Output window.

Descriptive Statistics

This function calculates the standard set of descriptive statistics: , s, min, max, n.

Under Analyze, select Descriptive Statistics, then select Descriptives or Frequencies. Click OK.
In Descriptive Statistics, you can specify other descriptive statistics using the Options button (lower right corner of dialogue box),
In Frequencies, you can specify statistics using the Statistics button.

Conduct random sampling

This function allows you to randomly select observations from your dataset. In Lab 3, Question 2d asks you to select 3 different random samples of life expectancy data (where n = 5, 20 and 100). Follow steps 1 to 6 for each sample. Once the three samples are chosen, go to step 7.

For each sample:

Under Data, select Select Cases.
Choose Random Sample of Cases and click the Sample button.
Specify the sample size as follows:"Exactly {your desired sample size} cases from the first {the total number of observations} cases."
Click the check box below to ensure that unselected cases are Filtered (not Deleted) and click OK.
A new variable (Filter) will be added to the Data Editor, where 1 = case is selected in sample, 0 = case is not selected.
Highlight the filter column and copy/paste it into a new column. Rename this column (i.e. Fil_5 for "filter to select 5 observations").

Note: If you do not copy and paste the filter into a new column, SPSS will overwrite the information when you choose your next sample.

Create the next two samples using the same procedures (steps 1 to 6). You should have three filter variables labelled Fil_5, Fil_20 and Fil_100.

Non-parametric Tests (Goodness of Fit)

The non-parametric sub-menu lists several tests including:

one sample chi-square
one sample KS
two sample chi-square

a. One sample Chi-square

Under Statistics Non-Parametric, select Chi-square.
Move your variable into the Test Variable box
Under Expected Range, keep the default selection Get from Data
Under Expected Values, click All categories equal
Click OK.
In the Output window, you will see two tables. The first table contains the observed and expected frequencies used in the Chi-square test. The residual column is the actual difference between observed and expected - not (Obs-Exp)²/Exp. The second table gives you the chi-square (*) and degrees of freedom (df).

b. One sample KS test

Under Statistics Non-Parametric, select 1 sample KS.
Move your variable(s) into the Test Variable box. (You can move multiple variables into the Test Variable box. SPSS will run a separate test for each variable)
Under Test Distribution, click on the appropriate expected frequency distribution (normal, uniform, poisson or exponential)
Click OK
In the Output window, the results of the KS test are shown in one table. In the hypothesis test, use the absolute difference under Most Extreme Differences for D*, not the KS Z value. The p-value is listed as Asymp Sig. for each variable. p-value is

c. Two or more sample Chi-square

Under Statistics Descriptives Statistics, select Crosstabs.
Move your variables of interest (govt, laws, manage and title) into the Row box
Move the variable that distinguishes your samples (sex) into the Column box
Click on the Statistics button; select Chi-square (located in the top left corner) and click Continue.
Click on the Cells button; under Counts, select Expected and Observed and click Continue
Click OK to run the test
In the Output window, you will see two tables for each variable you tested.
- First table : This is a summary of the observed and expected frequencies calculated by SPSS. Notice that one variable may have 4 categories of answers, whereas another may 3 categories of answers. Use the appropriate number of categories when calculating the number of degrees of freedom.
- Second table : This table contains the calculated *. It is called Pearson's chi-square and is listed in the column Value. The p-value is listed in the column Asymp Sig..
- If you forgot to specify 'chi-square' under the Statistics button, you will have no test results.

Parametric Tests (Difference of Means)

The compare means sub-menu lists several tests including:

one sample t-test
independent sample t-test
paired sample t-test

a. One sample t-test

Under Statistics Compare Means, select One sample t-test.
Select variable for test in right hand dialogue box.
Move to Test Variable box using arrow.
Enter the Test Value (known or hypothesized population value).
Click OK
In the Output window, you will see two tables. The first table contains descriptive statistics for the variable. The second table contains the test statistic in the column 't'. The degrees of freedom for the test are listed under the column 'df'.

b. Independent sample t-test

Under Statistics Compare Means, select Independent Sample t-test
Move the variable of interest into the Test Variable box.
To separate the two samples within the test variable, move the variable with the grouping criteria into the Grouping box (see example below)

Click Define Groups and fill in the text that differentiates your samples.

Example: For the Squid dataset:

Province	Damage
CHA	220
CHA	353
CHA	415
RAY	279
RAY	337
RAY	380

To conduct the 2 sample test on the variable Damage:

- move Damage into the Test Variable box
- move Province into the Grouping box
- click Define Groups and type CHA in Variable 1 and RAY in Variable 2 to differentiate your groups
- SPSS will separate the two groups and treat them as separate samples
In the Output window, you will see two tables. The first table contains the descriptive statistics for each group. The second table contains the two t* statistics in column 't'.
- DO NOT USE LEVENE's F*. Use the variance ratio test outlined in the lab manual.
- If the F test indicates that the variances are equal, use the top t*.
- If the variances are not equal, use the lower t* for the decision rule.

c. Paired sample t-test

Under Statistics Compare Means, select Paired t-test.
Click on the first variable of your pair - it appears in the Current Selections box as Variable 1
Click on the second variable - it appears in the Current Selections box as Variable 2
Click the arrow to move these variables into the Paired Variable box
Click Ok
In the Output window, you will see three tables. The first table contains the descriptive statistics for each variable. The second table contains correlations. The third table contains the mean and standard deviation of the differences and the t* statistic in column 't'. The degrees of freedom are listed under column 'df'.

Correlation

The Correlate sub-menu has two correlation functions:

bivariate correlation
partial correlation

a. Bivariate correlation (Pearson's r and Spearman's )

Under Statistics Correlate, select Bivariate.
Move the variables of interest into the Variables box (you can move multiple variables).
Select the test type: Pearson's r or Spearman's
Under Test of Significance, select the direction of the test (one tailed or two tailed). SPSS will calculate the appropriate p-values.
Select Flag significant correlations
Click OK
In the Output window, you will see one table (matrix) containing the correlation coefficients, number of observations and p-values for the selected variables.

	Variable A	Variable B	Variable C
Variable A	1.000 -- 80	.854** .000 80	-.562** .016 80
Variable B	.854** .000 80	1.000 -- 80	.254 .302 80
Variable C	-.562** .016 80	.254 .302 80	1.000 -- 80

Notes:

SPSS will show a perfect r (1.000) for the correlation between the same variable (A and A, B and B, etc)
SPSS flags (**) correlations that are 'signficant' based on its p-value calculation. If the p-values are very small (p=0.000...), SPSS will indicate that the variables are significant at the 0.01 level - it changes from 0.05 to 0.01).
The correlation matrix has a mirror image. The correlations in the top right corner (in blue) are the same as the correlations in the bottom left (in red).

b. Partial correlation

Under Statistics Correlation, select Partial.
Move the variables of interest into the Variables box.
Move the controlled variable into the Controlling for box.
Under Test of Significance, select the direction of the test (one tailed or two tailed). SPSS will calculate the appropriate p-values.
Click OK
In the Output window, you will see a small correlation matrix. This matrix shows the coefficient and p-value for the correlation between the two variables of interest, once the influence of the third variable (your controlled variable) has been removed.

Note: the degrees of freedom are n-3 for the partial correlation.

Linear Regression

Under Statistics Regression, select Linear.
Before starting the analysis, you must identify your dependent and independent variables
Move the appropriate variables into the Dependent and Independent boxes
Click on the Statistics button. On the right, click Model Fit and Descriptives. Click continue.
Click on the Plots button. Select *ZRESID for Y window and DEPENDNT for X window. Click continue.
Click on the Save button. Under Residuals, click Standardized and click continue.
Click OK
In the Output window, you will see several tables. The important tables are listed below.
1. Descriptive Statistics: Presents and n for each variable
2. Correlations: Presents the Pearson's r correlations (and p-value) between variables
3. Model Summary: Presents r, and the standard error of the estimate (in blue - this number is used for confidence intervals).

1. Model R R Square Adjusted R Square Std. Error of the Estimate
  1 0.909 0.828 0.800 152.94
2. ANOVA (Analysis of Variance): Presents the regression, residual and total sum of squares, the degrees of freedom and the calculated F value (in blue) The Sig. column shows the p-value for F*.

Model	R	R Square	Adjusted R Square	Std. Error of the Estimate
1	0.909	0.828	0.800	152.94

1. Model Sum of Squares df Mean Square F Sig.
  1 Regression 1459528.290 1 1459528.290 62.396 .000
  Residual 304089.443 13 23391.496
  Total 1763617.733 14
2. Coefficients: Presents the regression coefficients: (in red) and (in blue).
  
  Model Unstandardized Coefficients Standardized Coefficients t Sig.
  B Std. Error Beta
  1 (Constant) -1415.231 273.742 -5.170 .000
  RAINFALL 1.264 .160 .910 7.899 .000
In the Data window, you will see an extra column called ZRE-1. This column contains the standardized residuals (the difference between each observation and the line of best fit).

Model	Sum of Squares	df	Mean Square	F	Sig.
1	Regression	1459528.290	1	1459528.290	62.396	.000
	Residual	304089.443	13	23391.496
	Total	1763617.733	14

Model	Unstandardized Coefficients	Standardized Coefficients	t	Sig.
B	Std. Error	Beta
1	(Constant)	-1415.231	273.742		-5.170	.000
RAINFALL	1.264	.160	.910	7.899	.000

To prepare a residual plot if you haven't requested PLOT of *ZRESID and DEPENDNT:

- Create a scatterplot with the dependent variable along the X axis and the residuals along the Y axis (if you forgot to choose 'standardized residuals' under Save when you initiated the regression function, you will not have a variable called ZRE-1).
- Add references lines at 2, 0 and -2 to the residual plot. See Chart Editor for information about to adding reference lines to residual plots.