Pallant 2001 (review)

This is the full text of my original review. A much-shortened version appeared in SRA News (newsletter of the Social Research Association) in Nov 2002 pp. 10-11

[NB: There is also a completely different review of the 2nd edition on this site.   I have also replicated some of her examples using syntax rather than drop-down menus: these appear in the full text of Old Dog, Old Tricks  and in the 5th accompanying slide show: Exercises from Julie Pallant SPSS Survival Manual elsewhere on this site]

Julie Pallant [1]
SPSS Survival Manual: a Step by Step Guide to Data Analysis Using SPSS for Windows (Version 10 and 11)
Open University Press, March 2001, £17.99, 304pp., Spiral bound ISBN 0 335 20890 8

Reviewed by John F Hall [2]

This book is clearly based on Julie Pallant’s own experiences in planning, conducting and supervising research projects and on her teaching. It is sympathetically written and full of helpful tips, but as a guide to SPSS for Windows, it may not be quite suitable for absolute beginners as it assumes a working knowledge of Windows and some familiarity with basic statistical and psychological ideas.

She occasionally uses cookery analogies to explain the research process: whether this is appropriate or not, it certainly lends charm and credibility as well as helping to gain students’ confidence and overcome fear of anything with numbers in. The large diameter spiral binding allows the book to be left open at any page and fully folded back without ripping the edges out, an important consideration for much-thumbed manuals, as this will surely be.

Although publicised as “an excellent introduction to using SPSS for data analysis” and as throwing “a lifeline to students and researchers grappling with SPSS”, this manual might more accurately be described as being very useful to (at least sophomore) students and researchers in psychology, psychometrics, health sciences and the like (who tend to use data sets consisting of large numbers of scale measurements and whose analysis is largely confined to multivariate inferential statistics), who are doing their own projects with little or no supervision or support.

However, students and researchers in other areas such as sociology, political science, planning or market research (who tend to use data sets with fewer scales and simpler questions, and whose analysis tends to be by demographic groups using percentages and means) are likely to find the book at least frustrating and at worst irrelevant: they might be better off starting with the SPSS Guide to Data Analysis by Maria Norusis (£42!!) or similar introduction.

Julie compresses problem formulation, research design, data collection, cleaning, management and preliminary analysis (whilst in the main sound enough, but again heavily oriented to psychology) into the first 48 pages.

There are several sites worldwide with down-loadable course notes which cover these topics in much more detail (the most impressive example being the SPSS Tutorial comprising 11 chapters and 3 appendices from computing support services at www.boun.edu.tr/bucc/support/spss [3]) Most students should then be ready to tackle Julie’s further 230 pages devoted to specific statistical techniques and exercises in developing and testing various psychological scales taken from her own research projects.

Frequency and contingency tables appear far too briefly (the latter only once), and then only to find outliers, produce a chart or obtain a statistic such as arithmetic mean or chi-square. Researchers can tell a lot from percentages in frequency counts and contingency tables, but would be hard pushed to find any analysis based on percentages rather than inferential statistics: there is none. Where percentages do appear, they are sometimes incidental to the table presented (and often superfluous, as in an example on p258 where several sets of percentages appear in a large table illustrating chi-square, for which they are totally irrelevant).

Experienced survey researchers and research supervisors will also find this early section a tad naïve: for instance, she quite happily suggests piloting questionnaires on the population to be studied (prisoners, unemployed youth) without querying the legitimacy of letting inexperienced students loose on the said population without adequate prior training and close supervision or informed consent. There is also a tendency to accept SPSS output in its entirety without editing which makes for quite cluttered presentation at times. This is partly due to the design of SPSS, but tighter editing and ruthless pruning could often make the key features of the analysis much clearer.

Many surveys contain questions to which more than one response is possible as well as sets of items answered in “Yes or No” format. Sometimes these data are received in multipunched (column binary) format and need to be expanded into matrix format for analysis. Other surveys often contain questions for which the data may vary in quantity between cases (eg detailed information about each person in a household or each job held will vary according to the number of people or the number of jobs). Julie makes no mention of column binary or hierarchical data input, both of which can be handled by SPSS, or the MULT RESPONSE feature for producing frequency counts and contingency tables.

Although she gives examples of the questionnaire items used for the data sets in the exercises, it is not clear whether these are facsimiles of the originals or tidied up typeset versions: this makes it difficult to assess what respondents were faced with when completing them. Some of these are copyright and can only be used in the original format, but this begs the question of the quality of such questionnaires, many of which do not seem to have benefited from the vast literature on the format and effectiveness of self-completion questionnaires. Some of the items are referred to in the appendix, but not listed in full. This researcher would like to have seen the actual items used for Life Satisfaction, Fear of Statistics, Confidence in Coping with Statistics and Depression. The two data sets for the many detailed exercises can be accessed from www.openup.co.uk/spss/data but are not fully labelled.

Data entry is assumed to be by the researcher, but Julie admits this could be a problem for large data sets. This is a dubious practice for very large data sets in any case, as errors are easy to make and not always easy to spot, particularly if codes entered are within range but incorrect, as can happen with long sequences of items in attitude scales. More preferable (and usual in survey research) is for data entry and verification to be done by experienced others, either directly using CAPI technology or indirectly via Excel, SPSS or as a raw data matrix in good old 80-column line format. The latter may nowadays be regarded as old technology, but, because it displays whole data lines on the screen, it does lend itself to visual clues as to misalignment of data or missing or duplicate lines or cases, whereas Excel or SPSS formats are less easy to verify visually. Julie covers simple checks on ranges for codes and suggests ways of correcting errors, but experienced SPSS users will have quicker ways of doing this via the syntax editor rather than by innumerable points and clicks from the dialog box. A major omission here is any mention of logical checks on coding anomalies (ie 16 year old girl married for 20 years; male with recent hospital admission for abortion) for which SPSS procedure COUNT in combination with LIST (neither of which is mentioned in the book) can come in useful (after a CROSSTABS run has revealed initial anomalies)

We now come to the vexed question of conventions for naming of files and variables in SPSS. Julie rightly recommends keeping a full and accurate data codebook and day-to-day log of all SPSS runs. However, whilst researchers doing their own studies, particularly over an extended period for a doctoral thesis, may well be able to remember what they have called their variables, where they are in the file, how they were derived and which of many SPSS runs covered which analysis, this reviewer much prefers a more systematic approach which enables other researchers and users easily to find their way around someone else’s data with a minimum of documentation.   This applies particularly when you are dealing with several surveys at once or dozens over a period of time.

Files called survey.sav or experim.sav could be anything, but NUS82.SYS, CRIME82.SYS or BSA89.SYS at least indicate something (SPSS system files for National Union of Students Undergraduate Income and Expenditure 1982, First British Crime Survey 1982 and British Social Attitudes 1989 in this case) preferably kept in separate directories (or folders in Windowspeak) called NUS82, CRIME82 and BSA89.

This means a different approach to variable naming which follows positions in the raw data set and which ideally should be related to the original questionnaire. Most good questionnaires indicate data coding positions (traditionally in 80-column line format) in the margins, so that a question response coded in (the field beginning in) column 34 of line 2 could, in one preferred convention, be called V234 and a variable called V2261 would be the questionnaire response coded in (the field beginning in) col 61 of line 22. This system allows immediate interpretation from questionnaire to computer output and vice-versa, but perhaps with new data capture systems this method is used less and less?

With her emphasis on code books, log books and record keeping, Julie rather surprisingly omits any mention of DISPLAY with its range of useful features for listing contents of SPSS files from summaries to full listings of formats and variable and value labels. This, together with an annotated copy of the original questionnaire and an unweighted frequency count of all variables (except those with decimal places or hundreds of values) should form an essential part of every survey researcher’s arsenal.

Likewise Julie’s naming of SPSS command files by the date they were run could be confusing even with a comprehensive log-book. What’s wrong with calling command files after the procedure used? (eg Frequency runs could be freq1.sps, freq2.sps, tabulations tab1.sps or even sextab1.sps, agetab1.sps etc., through means1.sps, anova1.sps or whatever.) This way there is less dependency on a logbook: just go to the syntax editor or double click the SPSS syntax file. As for dates and names, all you need is a directory listing of the *.sps command files.

For all procedures in the book Julie gives a full example of point and click, however tedious this may seem, but she does twice concede that sometimes it is easier and quicker to go to the syntax editor. Indeed at one point she refers to “the good old days” (presumably before Windows).

Nowhere is this clearer than in an 8-step example (p40) to select men for analysis: it’s much easier to go to the editor and write SELECT IF SEX = 1) On pp75-76 her 7 steps (one very repetitive) to reverse the codes on 3 items in a scale could much more easily done in the syntax editor using RECODE or DO REPEAT….COMPUTE: on p77 no less than 11 steps are used to calculate a score across 6 scale items (range 1 –5) when a simple COMPUTE command would suffice. The latter example yields a score in the range 6-30, but surely it would be better to subtract 6 (the number of items in the scale) to yield a ratio scale with a true zero point and a range of 0-24? Thus: compute opscore = sum.6 (op1 to op6) – 6. is much simpler.

This issue of a true zero point applies to all scales calculated in the book: quite possibly the scales are replications of others in the literature, but to this reviewer at least it seems to be bad practice.

Like many psychologists, especially those who write statistically oriented textbooks, Julie can’t wait to get to descriptive and inferential statistics and the normal distribution. This tendency can sometimes blind a researcher to the true nature of the data in question.

For instance Julie advises against frequency counts for variables with many values (she seems not to know about condensed format or truncation) and produces a scale of stress which is presented as a histogram. On the face of it the distribution is bimodal, even when produced as a line graph. This usually indicates two or more sub-populations or a problem with grouping criteria. In fact, with this data, SPSS cannot get a finer resolution for the chart, but the actual frequency count shows that the distribution is in fact unimodal. Also, the more items are added together to obtain a score, the more the distribution of the resultant score approaches normal: this should have been mentioned in the book.

At the risk of being pedantic, age as measured here is not a continuous variable (p74), but discrete (ie measured in whole years, not fractions). As such, a mean calculated on age last birthday will be approximately 6 months short of the true mean which ought really to be calculated using the mid-points of 12-month intervals. Thus age should actually be calculated as age + 0.5. This is a common error, but worth noting nonetheless, especially in a text book.

She rightly prefers actual age to be recorded as this allows for different groupings to be generated for comparison with other research, but what on earth is the point of a line graph for three age groups? When creating age groups she gives a 14 step point and click example for RECODE from age into a new 3-category variable agegp3 (pp83-84): whilst this would work (and drive most users crazy), it would be much easier to do it from the syntax editor by writing:

RECODE age
(18 thru 29 = 1)(30 thru 44 = 2)
          (45 thru 82 = 3)(else = sysmis)
into agegp3.
VAR LAB agegp3 ‘Agegroup of Respondent’.
VAL LAB agegp3 1 ’18-29’ 2 ’30-44’ 3 ’45 & over’.

Although she uses the keywords LOWEST and HIGHEST in her own RECODE statements, this could be dangerous if codes such as –1 or 99 or 999 have been used for missing answers, but not previously been declared as missing. In situations like this the use of (ELSE = SYSMIS) in the REC ODE statement can help to avoid problems.

Where Julie does present examples of SPSS syntax they appear in full in block capitals (except for variable names). Granted this is what the PASTE button does in SPSS, but many users know that SPSS is case insensitive and only needs the first four characters of commands and the first three of subcommands, so instead of writing:

FREQUENCIES VARIABLES = AGE
            /STATISTICS MEAN MEDIAN
            /FORMAT = CONDENSE
            /HISTOGRAM

Busy researchers can write instead:

freq var age /sta mea med /for con /his.

This saves time and money (and tears) and helps to avoid RSI! Unfortunately it doesn’t save paper or trees consumed when SPSS output is printed up unthinkingly. It is probably better to close .spo output files without saving them if they are full of error messages.

The one syntax example Julie writes herself (actually only a small modification, inserting the keyword with into the syntax from the PASTE facility)

CORRELATIONS
/VARIABLES =   tposaff tnegaff tlifesat with tmast tslfest
/PRINT = TWOTAIL NOSIG
/MISSING = PAIRWISE.

could more easily be written as:

corr
/var tposaff to tlifesat with tmast tslfest
/pri two nos
/mis pai.

SPSS does not need the = signs either: so once students have got the hang of all this after a few sessions, they are up and away.

Julie rightly advises frequent saving of work during sessions, especially when entering data. Most of us have at some time experienced a catastrophic loss of (hopefully not too much) work through computer failure, power cuts or accidental exit/deletion. For this reason it is probably better to keep copies of raw (ASCII) data files separate from SPSS and read them in using DATA LIST (not covered in this book) rather than rely on SPSS .sav files. There are technical (and intellectual) risks attached to relying on SPSS for everything!

Once she gets into data analysis proper (albeit with descriptive and inferential statistics rather than percentages), there are well written, thorough and sound explanations and advice for rookie researchers, including the use of graphic causal (path) models and blank tables to illustrate hypotheses to be tested about the relationships, if any, between two or more variables, before running the actual analysis. Perhaps the blank tables for entering statistics (in this case means) for the dependent variable within cells defined by the categories of the independent variable(s) should be expanded to include statistics for the whole sample and sub-samples as well, but SPSS will produce these by default when the analysis is actually run.

This section could well benefit from expansion in future editions to include examples using percentages or proportions (elaboration) since the logic of analysis and the research question (What happens to a zero-order statistic for the relationship between a dependent and an independent variable when controlling for the effect of one or more test variables?) are identical.

There is a full discussion of a very large range of appropriate parametric and non-parametric statistics with a particularly useful table on pp106-7. Although she claims that there is no non-parametric alternative to multiple regression, this is not quite true as statisticians have been working on the problem for several years, and SPSS already has GLM and LOGLINEAR in the Advanced Statistics module. Although she mentions DISCRIMINANT for predicting a categorical variable from several numeric variables, she does not mention CLUSTER or QUICK CLUSTER which can be very useful when sifting through sets of correlation matrices derived from items in attitude scales to find groups of items which seem to hang together. The latter is sometimes quicker and simpler and will produce similar or identical results to endless factor analyses.

UK readers will be unfamiliar with most of the bibliographic references, except for Oppenheim and possibly Everitt, Crowne & Marlowe or Robinson and Shaver. For statistics they are much more likely to come across Loether and McTavish, Blalock or Rowntree, but for research design, data analysis, statistics and SPSS in a single volume, there is nothing to compare with Norusis. For examples using measures of life satisfaction and positive and negative affect there is surprisingly no mention of the pioneering work on subjective social indicators in the early 1970’s by Bradburn or Campbell and Converse in the USA or Abrams and Hall in the UK or McKennell’s comparisons of both (the data from which are freely available via the Essex Data Archive). Perhaps there also dangers in an over-reliance on a limited range of journals!

This is clearly a first attempt that fills a gap in the market and will run to more editions: it has already been reprinted twice. Well done, Julie. Now, how about a second edition or a companion volume, this time with percentages in?

[1]  In 2001 Dr Julie Pallant was Lecturer in Psychology in the School of Mathematical Sciences at Swinburne University of Technology (Australia) She is now Associate Professor and Director of the Rural Health Academic Centre at the University of Melbourne

]2] From 1970 to 1976 John was Senior Research Fellow in the Survey Unit of the then Social Science Research Council and from 1976 until his (early) retirement in 1992 Principal Lecturer in Sociology and Director of the Survey Research Unit at the then Polytechnic of North London. He was involved at senior level in the design, management, analysis, reporting and documentation of dozens of surveys ranging from major national and international studies to small projects with local and voluntary groups, all using SPSS (as well as giving advice and assistance to hundreds of students and clients). Although fluent in SPSS (having worked with it on a variety of mainframes from its first appearance in the UK) he was, until undertaking to review this manual, a relative novice in Windows and the PC versions of SPSS. He is now much improved!

He is grateful to Major Lester and Herve Mignot of SPSS Inc for making available an evaluation copy of SPSS11 for Windows (Please can I keep it, just for a little while longer?) which was invaluable in assessing Julie’s book. He is also grateful to Dr Jane Fielding of Surrey University for making available copyright materials from one of her courses and information on her forthcoming book (with Prof Nigel Gilbert) and to the UK Data Archive at Essex University for copying backup files from a set of 9 year old Vax magnetic tapes on to CD-ROM’s to enable conversion and testing of research and teaching materials with SPSS11 for Windows (based principally on the 1986 and 1989 waves of the British Social Attitudes series and a 1981 survey of fifth formers in a North London comprehensive school , which incidentally replicated some of the items in one of Julie’s sample data sets).

At least for this reviewer, JP’s book prompted much post-poned conversion and editing (from WordStar4 to MS-Word) of all the teaching materials from his highly popular and effective part-time post-graduate evening course Survey Analysis Workshop which can now be made available to a wider audience, although some editing of SPSS examples and rewriting of alternative worksheets and homework exercises is still needed to take account of differences between previous Vax mainframe and current PC versions of SPSS syntax. It also prompted the re-creation of the main data sets and files used in his course which now exist in SPSS11 for Windows format.    All of this material has been made available to Julie. It can be now made available to others, provided they respect his copyright.

[3]  Page no longer available in 2006, but recommended similar tutorials are: http://www.indiana.edu/~statmath/support/bydoc/ (getting started) and http://www.utexas.edu/cc/stat/tutorials/ (statistics analysis and more advanced help topics). A useful hub for links to other SPSS users, tutorials and topics is http://www.spsstools.net/ See also pages SPSS Intros and Tutorials and SPSS Textbooks on this site.

Journeys in Survey Research

Pallant 2001 (review)