Here's a quick review of what you learned last week. If you are like me, it always helps to see things a second (or third, or fourth ) time.
To call up the data set with which we are working, double click on the SPSS icon on the desktop, pull down the File menu and click on Open. The data set for our exercise is called "setups96.sav" and you double click on that to open it.
Run a crosstabulation of two variables to test the hypothesis that support of Bush in 1992 (measured by V2 as presidential vote) grew as one's personal financial situation (measured by V46, change in personal financial condition) improved.
You can run this crosstabulation by pulling down the Statistics menu to Summarize and clicking on Crosstabs. V2 is your dependent variable, so place it in the row box;
V46 is your independent variables, so place it in the column box. Click on the Cells box at the bottom; then click the box for Column under Percentages; click on Continue. Now, for a slight addition, click on the Statistics box at the bottom; under Ordinal Data, click on the box next to Kendall's tau-c. This is a statistical measure of the relationship between the two variables in which you are interested. SPSS will compute the value without any more work for you. Click on Continue.
Finally, paste all of this into your syntax window by clicking on Paste.
In the syntax window, highlight everything from the word Crosstabs ... to the period at the end. Click on the Run icon (ç ) at the top to run this command.
Now, you should see results in your output window.
How would you interpret these results? What do each of the cell percentages mean? What do the percentages on the side and bottom of the table mean?
2. RECODING variables
Notice that this cross tabulation is a little bit confusing, because it gives you more information than you really want. You are only interested in the relationship between change in personal finances and support for Bush. You really don't care (from the perspective of this hypothesis) whether people supported Perot or Clinton. You only care that they did not support Bush. In this section, we will RECODE the dependent variable so that it is categorized as support for Bush or not for Bush.
Click on "Old and New Values" at the bottom of the dialog, box. Another dialog box will open. In the upper left-hand corner is a box labeled "Value" under the "Old Value" heading. Enter the number "3' (without the quotations). Click on the "Value" box under the "New Value" heading. Enter the number "2" (without the quotations). Now click on the box "Add" directly under the Value box under the New Value heading. Your transformation (3 --- >2) should appear in the window to the right. Finally, click on Continue.
At this point, you have been returned to the "Recode Same Variables" dialog box. Click on Paste and then click on the "Syntax" window to check out all the cool commands you have just given the computer.
Highlight the command lines RECODE down to the period (after EXECUTE. Click on the Run icon at the top of the "Syntax" window.
Run a frequency of V2 to see what your new values look like. Review Lab 1 if you do not recall how to do this.
3. Value labels
Under the "Value Labels" heading, enter the number 2 in the "Value" box (we wish to change the label for the value of 2 (Clinton or Perot voters). Then enter the new label into the Value Label boxpick your own new label (i.e. "not Bush," "challengers," etc.). Click on the "Change" box, making sure your new labels appear in the box on the right. (If the "Change" box is not highlighted, click on "Add" and then respond to the dialog box that you want to change the value label. This is an example of a glitch that seems to happen at times for no apparent reason.) Click on Continue, and then click on OK in the Define Variables dialog box. Run a frequency of V2 to see what your new value labels look like.
The only way to return to the original categories while you are working is to quit the program, not saving the data window, and then to reopen the complete data set again and start from scratch. Consequently, it's usually a good idea to only RECODE something that you want permanently changed for one working session. The alternative is to RECODE "... Into New Variables" (strongly recommended as long as you document how you got these new variables and adequately label the new categories).
If you make changes that you want to save, you must save the data set on your own disk under a different name. Queries to Jackie: 1. Is there any way we can introduce a fail safe so that no one saves a changed dataset by mistake? 2. Will they have enough room on their zip disks to save a data file of some size; and does it violate our ICPSR agreement if they do that?
An alternative way of recoding variables is to create a new variable column to put the recoded values into. Go ahead and close SPSS, not saving the data window, of course. Re-open SPSS and the setups96.sav file. If you run a frequency on V2, you will notice that the changes you made to this variable have been obliterated.
4. RECODING into different variables
This time, let's recode V2 into a measure of major party vote. We can create a new variable that measures the vote for the two major parties by just including the votes for Bush and Clinton.
Change the value labels for your new variable, PTYVOTE, so that 1 = Bush-Rep; and 2 = Clinton-Dem.
Recoding is only one of several ways to use only the Bush and Clinton vote information in V2. The next two sections show alternate ways of getting the same result.
4. Select Cases
The Select Cases command allows you to divide up your data set into a specific subgroup for analysis. Essentially, the logic behind this command in SPSS is that you create a conditional "filter" variable which you select. When your "filter" is on, your data transformations, computations, etc., will only be performed on your selected subset. This is a useful, but potentially troublesome tool. Always be aware of whether or not you are working with the entire data set or your selected subset--a box at the bottom of your Data Window will indicate when your filter has been activated.
1. An easy example of Select Cases: Using Select Values of a Variable
We can use Select Cases to essentially replicate the process of creating PTYVOTE (above) by simply selecting on those respondents who did not vote for Perot (voted for either Bush or Clinton).
Data > Select Cases will get you into the "Select Cases" dialog box. Notice that currently, "all " is/are selected. Click on the "If condition is satisfied" option; click on the "If. . ." box; highlight variable V2 (the vote variable) and move it (using the arrow box) to the large empty box at the top and right of the "Select Cases: If" dialog box; click on the "=" button on the calculator pad and click on "3" on the calculator pad.
The large box should now read V2 =3. This will select those cases where this statement is trueselecting all respondents who reported a vote (V2) not for Perot 3). Click on Continue in the "Select Cases: If" dialog box; click on Paste in the "Select Cases" dialog" box; move to your "Syntax Window" and Run this command.
Look at the data window. You will not that there are lines through some of the cases; these are the cases that have been filtered out.
Run a crosstab of V2: on V48 and compare this with your results from PTY-VOTE and V48 (above). (Remember, you need not go through all the windows for this. You can type in the syntax directly or copy it from the previous command in your syntax file.)
1. Remove the previous Select Cases filter. You do this by returning to the "Select Cases" dialog box and deactivating your filter by clicking on the "All Cases" selection at the top of the dialog box. Click on Paste and run this command from your "Syntax Window." (You have probably figured out by now that instead of pasting to the syntax window and running these commands, you could simply click on "OK." That will work, but you will not have a record of what you have done. Believe meas one who has made numerous mistakescareful documentation pays off, even if it seems to slow you down.)
Select Cases can be useful in testing (in preliminary fashion) for potential spuriousness. In simplest terms, spuriousness means that the relationship you see is not the real one you are looking for. Rather some other factor is influencing both variables. For example, I suspect that the partisanship of the respondent (V8) leads them to evaluate national economic conditions differently. Specifically, I suspect that Republican identifiers will evaluate national economic conditions more negatively (because the presidential incumbent in 1996 was a Democrat) while Democratic identifiers will evaluate national economic conditions more positively (in an effort to give the Democratic president a "boost"). Furthermore, I am aware of the strong impact of partisanship on vote choice. Therefore, I suspect that the relationship between economic conditions and vote (V48 and V2, respectively) is affected by the spurious variable of partisanship.
2. We can start by creating a somewhat simplified version of the party identification variable, V8. Using Recode into different variables, create a measure called PID that collapses the values of V8 into Democratic identifiers, Independents and independent leaners, and Republican identifiers. Remember to add value labels and run a frequency to check the outcome.
3. Select those respondents who do not identify with a major party (Independents) using the Select Cases command. Run a Crosstab of PTYVOTE (two-party presidential vote) by V48 (national economic evaluations), remembering to click on Kendall's tau-c and correlations under Statistics.
4. Now run the Select Cases procedure, this time selecting only those respondents who identified with the Democratic Party. Run a Crosstab of economic conditions and PTYVOTE. Finally, run a Select Cases procedure to select out those respondents who identified with the Republican Party. Run a Crosstab of economic conditions with PTYVOTE. What do your results suggest to you? What do you think they say about the spurious variable?
5. Crosstabs (again)
As an alternative to running all of the above Select Cases commands, we can essentially replicate what we have done by controlling, effect within the Crosstabs subroutine itself.
To replicate the above Select Cases analysis (controlling for partisanship), run a Crosstab on the original relationshipmake sure that you are working with the entire data set (No "filter on" signal in your "Data Window").
Before you Paste the commands for your crosstabulation into your "Syntax Window," click on the variable that you wish to control for (here it is partisanship, PID); move PID into the lowest box in the "Crosstabs" dialog box (labeled "Layer 1 of 1"). Paste these commands into your "Syntax Window" and Run these commands. Check the "Output Window" for your results. As you will see, it is infinitely easier to control for this third measure within Crosstabs than to do so individually with the Select Cases command. Also, you can control for multiple measures by adding them to the bottom box.
However, we did not go through all of these exercises for nothing. There will be many times when you will want to do a good many separate analyses with subsets of your data. In those cases, the Select Cases command is very useful.
Take a deep breath. If you have kept up with all of this material, you are way ahead of me. It took me three weeks to relearn all of this when I first started using SPSS on my Windows machine at Brookings and a good deal of time when I reverted to my Mac at Colby. But if you can do the homework assignment, you will know that you have these skills mastered.