Lab 2


  1. Review

Here's a quick review of what you learned last week. If you are like me, it always helps to see things a second (or third, or fourth ) time.

To call up the data set with which we are working, double click on the SPSS icon on the desktop, pull down the File menu and click on Open. The data set for our exercise is called "setups96.sav" and you double click on that to open it.

Run a crosstabulation of two variables to test the hypothesis that support of Bush in 1992 (measured by V2 as presidential vote) grew as one's personal financial situation (measured by V46, change in personal financial condition) improved.

You can run this crosstabulation by pulling down the Statistics menu to Summarize and clicking on Crosstabs. V2 is your dependent variable, so place it in the row box;

V46 is your independent variables, so place it in the column box. Click on the Cells box at the bottom; then click the box for Column under Percentages; click on Continue. Now, for a slight addition, click on the Statistics box at the bottom; under Ordinal Data, click on the box next to Kendall's tau-c. This is a statistical measure of the relationship between the two variables in which you are interested. SPSS will compute the value without any more work for you. Click on Continue.

Finally, paste all of this into your syntax window by clicking on Paste.

In the syntax window, highlight everything from the word Crosstabs ... to the period at the end. Click on the Run icon (ç ) at the top to run this command.

Now, you should see results in your output window.

How would you interpret these results? What do each of the cell percentages mean? What do the percentages on the side and bottom of the table mean?

2. RECODING variables

Notice that this cross tabulation is a little bit confusing, because it gives you more information than you really want. You are only interested in the relationship between change in personal finances and support for Bush. You really don't care (from the perspective of this hypothesis) whether people supported Perot or Clinton. You only care that they did not support Bush. In this section, we will RECODE the dependent variable so that it is categorized as support for Bush or not for Bush.

  1. Return to the data window by pulling down the Window menu and clicking on our dataset.
  1. To recode a variable, pull down the Transform menu at the top of the Data window, then click on Recode and finally click on "Into Same Variables."

  2. At this point, a "Recode Same Variables" dialog box will appear. On the left-hand side of the box is the list of variables in our data set. Click on V2, which will then become highlighted. Click on the arrow next to the Variables box in the center of the dialog box, and V2 will now move into the Variables box, ready to be recoded.
  3. Click on "Old and New Values" at the bottom of the dialog, box. Another dialog box will open. In the upper left-hand corner is a box labeled "Value" under the "Old Value" heading. Enter the number "3' (without the quotations). Click on the "Value" box under the "New Value" heading. Enter the number "2" (without the quotations). Now click on the box "Add" directly under the Value box under the New Value heading. Your transformation (3 --- >2) should appear in the window to the right. Finally, click on Continue.

    At this point, you have been returned to the "Recode Same Variables" dialog box. Click on Paste and then click on the "Syntax" window to check out all the cool commands you have just given the computer.

    Highlight the command lines RECODE down to the period (after EXECUTE. Click on the Run icon at the top of the "Syntax" window.

  4. Nothing happened? All we just told the computer to do was to change the values for the vote variable. Now we need to tell the computer to show us these new values. Virtually every time you recode measures, it's a good idea to check in order to make sure that what you have created is in fact what you want. You do this by running a frequency distribution of the variable you have recoded. Recall that Frequencies are under t Statistics/Summarize menu in the "Data" window just as the Crosstabs command is.
  5. Run a frequency of V2 to see what your new values look like. Review Lab 1 if you do not recall how to do this.

  6. We have a problem. By recoding V2, we have included all of the Perot voters in 1992 with Clinton voters. Although this is EXACTLY what we told the computer to do, our presentation could be misleading. We need to change the value label for V2 so that our recoded measure is accurately described by the corresponding value labels.

3. Value labels

  1. We can change the value labels, or the labels for the numeric categories in V2, by highlighting the column in the Data window that represents V2. After highlighting the V2 column in the Data window, pull down the "Data" menu and click on "Define Variables."
  2. In the bottom left-hand corner of the "Define Variables" dialog box, click on the "Labels. . ." box.
  3. Under the "Value Labels" heading, enter the number 2 in the "Value" box (we wish to change the label for the value of 2 (Clinton or Perot voters). Then enter the new label into the Value Label box——pick your own new label (i.e. "not Bush," "challengers," etc.). Click on the "Change" box, making sure your new labels appear in the box on the right. (If the "Change" box is not highlighted, click on "Add" and then respond to the dialog box that you want to change the value label. This is an example of a glitch that seems to happen at times for no apparent reason.) Click on Continue, and then click on OK in the Define Variables dialog box. Run a frequency of V2 to see what your new value labels look like.

  4. Now, to see the results in context, re -run the cross tabulation of Vote and Change in financial condition. (Hint: An easy way to do this is to go directly to your syntax window, highlight the syntax for the crosstab you used before, and run it). This provides a much clearer picture of the relationship between support for Bush and personal financial condition.
  5. Notice, there was not a "paste syntax" option in the Define Variables screen, so you could not see the syntax that SPSS used to change the value labels. However, when creating a data set or changing many value labels, it is much faster to work in the syntax window. You can do this with the following commands: value labels v2 1 Bush 2 Not Bush. (Do not forget the period!!!) Note, the general form of the syntax is value labels << variable names>> <<value>> <<label>> . (Note: you can use more than 1 variable name if they have the same labels)
  6. WARNING TIME! PLEASE BE CAREFUL WHEN YOU RECODE! By changing the values of this variable in the working ("Data") window, we have permanently changed these data. That is why when you are finished working and want to Quit, you do NOT , repeat NOT save the DATA window. By not saving it, you revert to the original dataset.
  7. The only way to return to the original categories while you are working is to quit the program, not saving the data window, and then to reopen the complete data set again and start from scratch. Consequently, it's usually a good idea to only RECODE something that you want permanently changed for one working session. The alternative is to RECODE "... Into New Variables" (strongly recommended as long as you document how you got these new variables and adequately label the new categories).

    If you make changes that you want to save, you must save the data set on your own disk under a different name. Queries to Jackie: 1. Is there any way we can introduce a fail safe so that no one saves a changed dataset by mistake? 2. Will they have enough room on their zip disks to save a data file of some size; and does it violate our ICPSR agreement if they do that?

    An alternative way of recoding variables is to create a new variable column to put the recoded values into. Go ahead and close SPSS, not saving the data window, of course. Re-open SPSS and the setups96.sav file. If you run a frequency on V2, you will notice that the changes you made to this variable have been obliterated.

4. RECODING into different variables

This time, let's recode V2 into a measure of major party vote. We can create a new variable that measures the vote for the two major parties by just including the votes for Bush and Clinton.

  1. Transform>Recode>Into Different Variables (I think you are ready for that code now; it means pull down the "Transform" menu and then click on Recode and then click on "Into Different Variables . . .") will get you the "Recode into Different Variables" dialog box.

  2. Select V2 from the variable list on the left; enter in PTYVOTE in the "Output Variable Name" box; enter in the label for this new variable (two-party presidential voter); click on Change to change these names; click on the old and new values box at the bottom of the dialog box and enter in the appropriate values for your new measure (1 --> 1; 2 -->2); click on Continue; once back in the dialog box, click on Paste to enter these commands in your Syntax window. Go into the "Syntax Window" and Run your commands.
  3. Change the value labels for your new variable, PTYVOTE, so that 1 = Bush-Rep; and 2 = Clinton-Dem.

  4. Now, lets see if an individual's evaluation of the national economic conditions affects vote choice. Run crosstabs of V2 and PTYVOTE by V48, the national economic evaluations measure. Be sure to click on the Cells box to get the appropriate column percentages; while you're there, also click on Statistics... and highlight Kendall's tau-c and correlations.
  5. Recoding is only one of several ways to use only the Bush and Clinton vote information in V2. The next two sections show alternate ways of getting the same result.

4. Select Cases

The Select Cases command allows you to divide up your data set into a specific subgroup for analysis. Essentially, the logic behind this command in SPSS is that you create a conditional "filter" variable which you select. When your "filter" is on, your data transformations, computations, etc., will only be performed on your selected subset. This is a useful, but potentially troublesome tool. Always be aware of whether or not you are working with the entire data set or your selected subset--a box at the bottom of your Data Window will indicate when your filter has been activated.

1. An easy example of Select Cases: Using Select Values of a Variable

We can use Select Cases to essentially replicate the process of creating PTYVOTE (above) by simply selecting on those respondents who did not vote for Perot (voted for either Bush or Clinton).


Data > Select Cases will get you into the "Select Cases" dialog box. Notice that currently, "all " is/are selected. Click on the "If condition is satisfied" option; click on the "If. . ." box; highlight variable V2 (the vote variable) and move it (using the arrow box) to the large empty box at the top and right of the "Select Cases: If" dialog box; click on the "=" button on the calculator pad and click on "3" on the calculator pad.

The large box should now read V2 =3. This will select those cases where this statement is true——selecting all respondents who reported a vote (V2) not for Perot 3). Click on Continue in the "Select Cases: If" dialog box; click on Paste in the "Select Cases" dialog" box; move to your "Syntax Window" and Run this command.

Look at the data window. You will not that there are lines through some of the cases; these are the cases that have been filtered out.

Run a crosstab of V2: on V48 and compare this with your results from PTY-VOTE and V48 (above). (Remember, you need not go through all the windows for this. You can type in the syntax directly or copy it from the previous command in your syntax file.)

  1. A (slightly) more complicated example of Select Cases: Testing for Spuriousness.

1. Remove the previous Select Cases filter. You do this by returning to the "Select Cases" dialog box and deactivating your filter by clicking on the "All Cases" selection at the top of the dialog box. Click on Paste and run this command from your "Syntax Window." (You have probably figured out by now that instead of pasting to the syntax window and running these commands, you could simply click on "OK." That will work, but you will not have a record of what you have done. Believe me——as one who has made numerous mistakes——careful documentation pays off, even if it seems to slow you down.)

Select Cases can be useful in testing (in preliminary fashion) for potential spuriousness. In simplest terms, spuriousness means that the relationship you see is not the real one you are looking for. Rather some other factor is influencing both variables. For example, I suspect that the partisanship of the respondent (V8) leads them to evaluate national economic conditions differently. Specifically, I suspect that Republican identifiers will evaluate national economic conditions more negatively (because the presidential incumbent in 1996 was a Democrat) while Democratic identifiers will evaluate national economic conditions more positively (in an effort to give the Democratic president a "boost"). Furthermore, I am aware of the strong impact of partisanship on vote choice. Therefore, I suspect that the relationship between economic conditions and vote (V48 and V2, respectively) is affected by the spurious variable of partisanship.

2. We can start by creating a somewhat simplified version of the party identification variable, V8. Using Recode into different variables, create a measure called PID that collapses the values of V8 into Democratic identifiers, Independents and independent leaners, and Republican identifiers. Remember to add value labels and run a frequency to check the outcome.

3. Select those respondents who do not identify with a major party (Independents) using the Select Cases command. Run a Crosstab of PTYVOTE (two-party presidential vote) by V48 (national economic evaluations), remembering to click on Kendall's tau-c and correlations under Statistics.

4. Now run the Select Cases procedure, this time selecting only those respondents who identified with the Democratic Party. Run a Crosstab of economic conditions and PTYVOTE. Finally, run a Select Cases procedure to select out those respondents who identified with the Republican Party. Run a Crosstab of economic conditions with PTYVOTE. What do your results suggest to you? What do you think they say about the spurious variable?

5. Crosstabs (again)

As an alternative to running all of the above Select Cases commands, we can essentially replicate what we have done by controlling, effect within the Crosstabs subroutine itself.

To replicate the above Select Cases analysis (controlling for partisanship), run a Crosstab on the original relationship——make sure that you are working with the entire data set (No "filter on" signal in your "Data Window").

Before you Paste the commands for your crosstabulation into your "Syntax Window," click on the variable that you wish to control for (here it is partisanship, PID); move PID into the lowest box in the "Crosstabs" dialog box (labeled "Layer 1 of 1"). Paste these commands into your "Syntax Window" and Run these commands. Check the "Output Window" for your results. As you will see, it is infinitely easier to control for this third measure within Crosstabs than to do so individually with the Select Cases command. Also, you can control for multiple measures by adding them to the bottom box.

However, we did not go through all of these exercises for nothing. There will be many times when you will want to do a good many separate analyses with subsets of your data. In those cases, the Select Cases command is very useful.

Take a deep breath. If you have kept up with all of this material, you are way ahead of me. It took me three weeks to relearn all of this when I first started using SPSS on my Windows machine at Brookings and a good deal of time when I reverted to my Mac at Colby. But if you can do the homework assignment, you will know that you have these skills mastered.