Correctional Service Canada
Symbol of the Government of Canada

Common menu bar links

Compendium 2000 on Effective Correctional Programming

Warning This Web page has been archived on the Web.

CHAPTER 25

Cumulating Knowledge: How Meta-Analysis can Serve the Needs of Correctional Clinicians and Policy-Makers

PAUL GENDREAU, CLAIRE GOGGIN, and PAULA SMITH1


Correctional policy-makers and clinicians are bombarded annually with growing offender treatment and prediction literatures, the findings of which are often contrary. Meta-analysis offers a means of overcoming the innate biases of narrative or box score review techniques by standardizing the review process, and shifting the focus away from traditional significance testing by making use of point estimates (i.e., means) and confidence intervals. Thus, clinicians and policy-makers can place greater credence in the conclusions emanating from quantitative summaries and, in turn, incorporate them in the decision-making process. In this way, one can ensure that forthcoming correctional policies are empirically, rather than ideologically derived.

One of the most daunting tasks facing correctional policy-makers and clinicians are to make sense of the vast amounts of treatment and prediction data generated annually. This is not, of course, a problem unique to corrections. Hunt's (1997) convincing portrayal of the social science and medical research world where the landscape appears, at times, to be “pervaded by a relentless crossfire in which the findings of new studies not only differ from previously established truths but disagree with one another, often vehemently” (p. 1). In corrections, for example, there is conflicting data on the efficacy of treatment versus “get tough” strategies (e.g., boot camps) or the putative cruelty versus utility of prisons, to highlight just a few topics.

Is it any wonder, then, that when legislators, in collaboration with clinicians and policy-makers, attempt to generate cogent policies on issues of offender management, a perplexed look crosses their faces when confronted with the “data” (see Hunter & Schmidt, 1996)?

SOURCES OF CONFUSION

In our view, the genesis of this confusion has multiple sources (Gendreau, Goggin, & Smith, 2000). Two of the more crucial, that of ideology and the traditional methods of literature review and knowledge cumulating, will be the focus of this chapter.

Ideology

During the 1950s and 1960s, there was a naïve, idealistic belief amongst social scientists trained in North America that we were an experimenting society (see Campbell, 1969; see also Gendreau & Ross, 1987). In other words, respect for evidence generated from soundly conceptualized and conducted evaluations would, more or less literally, be translated into public policy. This, however, has not turned out to be the case, particularly in corrections, where contextual factors, such as political and professional ideologies, have frequently highjacked policy (Cullen & Gendreau, 2000, Gendreau, 1999; Gendreau & Ross, 1987).

The popularity of the “get tough” movement in the United States corrections illustrates this point. It coincided with the ascendancy of conservative values in the socio-political arena (Cullen & Gendreau, 1989) and the resulting ideologically-driven policies -- greater use of prisons (such as boot camps, longer sentences), community sanctions (such as electronic monitoring, drug testing) -- were presumed to effectively deter criminal behaviour despite being bereft of empirical support. Such initiatives to tally ignored the thousands of studies in the psychological punishment and social psychological literatures which would have predicted the folly of such strategies (Gendreau, 1996a). Indeed, several of the programs or policies that have emanated from the “get tough” ideology, such as cross-dressing humiliation therapy, John T.V., and reintroducing the whip into prisons (see Gendreau, Goggin, & Smith, 2000), defy credulity.

Political ideologues are not unique in this predilection for simplistic, common-sense notions of how the world works. Academics have been known to jump on ideological bandwagons as well. Andrews and Bonta (1998) have documented a plethora of instances where the offender personality and treatment literatures were dismissed by a number of criminologists as being of no consequence when a wealth of data spoke to the contrary. Disparities in interpretations of a literature by academics exist, in part, due to competition amongst various disciplines for academic pre-eminence and the attendant perks, access to external funding, and blatant careerism (Gendreau, Goggin, & Smith, 2000; Gendreau & Ross, 1979; Hunt, 1997). In fact, there is a class of “academic” who is particularly skilled at disguising his/her ideology. Included therein are the policy entrepreneurs and combat intellectuals who are adept at maintaining the pretence of being rational empiricists all the while serving their own or a special interest-funded ideological agenda (see Krugman, 1994; Starobin, 1997).

Pity then, the average policy-maker or clinician who is faced with this unseemly and contradictory brouhaha. For example, most clinicians have little time to conduct extensive literature reviews; as we detail later, most applied research literatures are also “huge” and ever more “technical”. Policy-makers, on the other hand, face a somewhat different challenge. In the “old days”, it was commonplace to find senior-level policy-makers who had specialized in the fields for which they were directly responsible, and who remained in their portfolios long enough to appreciate all of the “ins and outs” of the theories and evidence that informed their decision-making (Granatstein, 1982; Osbaldeston, 1989). Currently, the tenure of most policy-makers in a single portfolio is very short-lived (less than 3 years on average) and their background training is often generic in nature (Fulford, 1995). Increased political control over the bureaucracy (Savoie, 1999) likely militates against non-ideological discourse while reinforcing the development of quick-fix panaceas in response to pressing and often complex problems.

We are not suggesting that ideology is so insidious as to paralyze attempts at cumulating knowledge, nor that some aspects of ideological positions may not, in fact, be fairly “accurate” insofar as they are based on research findings. Indeed, no matter what the ideological barometer in a given culture, social scientists have always endeavoured to make some sense of research literatures. The traditional means by which this has been done, however, poses a major problem, particularly within large datasets.

INFORMATION OVERLOAD AND THE NARRATIVE REVIEW

Given the voluminous amount of information now available, some of which may be contrary, it is not surprising that clinicians and policy-makers entertain disparate notions of what works in corrections.2 Part of the problem lies in how a literature is reviewed, as the method of cumulating knowledge clearly influences one's conclusions and, thus, policy development. Traditionally, policy-makers have relied upon narrative reviews to make decisions regarding policy. These summaries generally have been qualitative in nature and involved the following process: typically, the writer formulates an opinion by reading a few influential theoretical articles, examines the available evidence, and then selects the results that substantiate his/ her position.

Although narrative reviews may be appropriate when a literature base is relatively small (e.g., 5 to 10 studies) or purely qualitative in nature, critics of this approach have noted several limitations (Glass, McGaw, & Smith, 1981; Redondo, Sanchez-Meca, & Garrido, 1999; Rosenthal, 1991). Perhaps the narrative's most troublesome shortcoming is its tendency to omit key data. As such, the scope of a literature review is often limited by the prejudices of the reviewer. Of equal concern, narrative reviews are virtually impossible to replicate. In addition, essential concepts are often poorly operationalized. Redondo et al. (1999) have also pointed out that the mind has a limited capacity for the systematic processing of a multitude of methodologies, outcomes, study characteristics, and potential moderators. One can appreciate, then, what an onerous task it can be to summarize large numbers of studies (i.e., 30-200). Typically what occurs (Gendreau & Ross, 1987) is that a reviewer favours a small subset of studies that he/she “likes” or can “handle” in order to generate conclusions about large and sometimes complex literatures. Glass et al. (1981) provide one of the most compelling examples of this phenomenon. When five leading scholars conducted narrative reviews of the same literature (the effectiveness of psychotherapy versus drug therapy) they differed as to which studies qualified for the review, disagreed as to which studies should be placed in the treatment categories, and disputed the consistency and magnitude of the results. In short, narrative reviews are on occasion distressingly imprecise.

A slightly more formal approach to cumulating knowledge is the box score analysis. In essence, this method tabulates the frequency of statistically significant versus non-significant effects within a given body of studies, the “winner” being the condition with the greater frequency. Although this technique appears straight-forward, the issue becomes complicated when the value of some statistically significant results are larger than others, or, worse, when the value of some non-significant results are larger than those designated as significant (determination of significance, of course, being inherently wedded to sample size)!

Several authors have concluded that both narrative reviews and box score analyses are of limited utility given their reliance on significance testing, and, further, that this has served to hinder the process of cumulating knowledge (Schmidt, 1996). Schmidt has cited several common misinterpretations accruing from statistical testing, among them: (a) a result that is statistically significant indicates whether the findings are reliable and can be replicated, (b) the significance level provides an estimate of the importance of the effect (i.e., p<0.01 is better than p<0.05), and (c) if one fails to reject the null hypothesis (p>0.05), then the results are due to chance alone and are likely zero. Each of these statements is incorrect, and can lead to gross misinterpretations about the nature of a given literature.

What is an effect size?

Effect size, a term now ubiquitous in the meta-analytic literature, simply refers to the size of the result obtained in a prediction or treatment study. In other words, it is an estimate of the magnitude of the correlation between a risk measure and outcome or the difference in a measure of outcome between a treatment group versus a control group. There are several ways to calculate an effect size but the one that is most favoured for ease of use and comprehension is the Pearson correlation coefficient (r) (Rosenthal, 1991). Unless the database is extreme in some way (has very high or low base rates, small sample sizes) the r value, or effect size, can safely be interpreted at face value (see Cullen & Gendreau, 2000). So, for example, if a cognitive behavioural intervention for offenders produces recidivism rates of 10% versus the control group's rate of 30%, then the r value will be 0.20 (a difference of 20% between the two groups) or very close to it. Similarly, in the case of a prediction study, the fact that the LSI-R predicts recidivism at r = 0.38 simply means, assuming a 50% base rate, that the recidivism rate of offenders who score high (above a designated cut-off score) recidivate at 69% versus 31% (i.e., a 38% difference) for those who score low on the measure.

By way of illustration, assume we have 5 studies, all with fairly small sample sizes of 30, 40, 80, 20, and 60. The treatment is a specific type of cognitive behavioural intervention used with high-risk offenders in different settings by different staff. In each study, the researcher records the reductions in recidivism and generates a correlation coefficient (r) to reflect the reductions of 0.34, 0.30, 0.21, 0.40, and 0.23, respectively. The mean effect size across all treatment programs is r = 0.30. Clearly, the results consistently point to an effective treatment. In consulting a table of values of r for different levels of significance, however, it is apparent that none of the individual r values is significant at the α = 0.05 level. A narrative review of these results would inevitably conclude that the intervention is not effective and suggest to a policy-maker that such a program be discontinued or fail to receive inaugural endorsement.3 In fact, we have seen instances in the literature where it has been demonstrated that different risk measures predicted recidivism equally well but since some correlations were “significant” and others were not, the latter were dismissed as being of little use, with the only profound difference amongst the studies being minor variations in sample size.4

Making better sense of research literatures: Quantitative research synthesis

How can this problem be resolved? One must standardize the review process and shift the focus of data analysis away from traditional significance testing. For example, a quiet revolution has been ongoing in medicine and psychology for about two decades whereby scholars have begun to synthesize research literatures in a more precise and quantitative fashion using a methodological process known as meta-analysis. Indeed, quantitative summary techniques have been used in the “hard” sciences for years (Hedges, 1987). One should note, however, that our goal in this chapter is not to train readers how to do a meta-analysis (they can, at times, be quite complicated and excessively time-consuming undertakings) (such as Cooper, 1997; Shadish,1996), but, rather, to give them a better understanding of the process.5 Fortunately, when it comes to the needs of clinicians and policy-makers, most elementary meta-analyses will suffice (Rosenthal, 1995). In corrections, as in most applied fields, one is rarely concerned with the subtle effect of higher-order interactions, the meanings of which are often problematic. Rather, the development of sound policy on important issues such as which type of treatment is more effective in reducing recidivism or which risk measure is the more accurate in predicting reoffending is best predicated on empirical conclusions (such as Gendreau, Little, & Goggin, 1996).6

What does a meta-analysis look like? Let's assume one wants to examine the factors that best predict academic performance among first year university students. A representative sample of 100 undergraduates is assessed. In the case of each student, one records his/her grade point average (GPA). In addition to gender, one also notes the student's age, family socio-economic status, intellectual aptitude, study habits, types of courses, grading methods, etc. One can readily see that it would be difficult to reach anything remotely resembling a precise general conclusion regarding the predictability of the GPA on the basis of just one student's data. For example, if the student had a relatively high GPA (i.e., 4.0) and came from a “good” socio-economic family background, it would be tempting to conclude that the correlation between the two conditions was necessarily important. On the other hand, one might surmise from a study of his/her transcript that GPA magnitude was unduly influenced by the student's selection of “easy” courses. After collating the results of the above factors for all students (n = 100), however, a much clearer picture emerges as some factors will likely produce larger correlations with GPA than do others. Further statistical analyses can then sort out which of the more robust correlations are among the most important. Essentially, this is what meta-analysis does, albeit the general focus is on the single study, rather than individual, as “subject”. Meta-analysis typically groups studies and the variables of concern along certain specified dimensions;7 expresses the outcomes of interest (i.e., recidivism) from these studies in a common metric known as an effect size, most often Pearson r; averages the effect sizes obtained; statistically analyzes these effect sizes to determine if variations in the magnitude of effect size are correlated with the type of variable under investigation or study characteristics. In this way, inconsistencies in a set of seemingly variant studies are uncovered and one can pinpoint the characteristics of studies producing apparently discrepant results.

Table 25.1 depicts what a meta-analytic data base “looks like” in its most elementary form. In this very simple display a wealth of information is revealed. Data from six treatment studies are detailed and, for the sake of brevity, we report on two very important moderators (at least vis-à-vis corrections): offender risk level and quality of research design.

The studies vary considerably as to sample size (nrange = 30 to 180) and effect sizes (rrange = -0.09 to 0.34). Recall our discussion about box-score summaries and significance testing. Only 2 of 5 studies in Table 25.1 (#3, #4), produce a statistically significant effect on recidivism, yet the 95% confidence intervals (CIr) of each of the six studies overlaps indicating that they are sampling from the same population parameter. Contrast these results with the conclusion one would reach using a box score tabulation of significant effects (i.e., treatment is ineffective).

TABLE 25.1 The relationship between treatment and recidivism across a sample of studies

.
Study No.
Risk
Quality
N
r  
CIr
.
1
L
L
52
-0.09  
-0.36 to 0.18
2
L
L
180
0.02  
-0.13 to 0.17
3
H
H
42
0.34*
0.07 to 0.61
4
H
L
82
0.22*
0.01 to 0.43
5
H
H
30
0.29  
-0.04 to 0.62
6
L
H
68
0.06  
-0.18 to 0.30
Total
454
0.14  
0.05 to 0.23
.

Note. Risk = offender risk level; Quality = study design quality; N = study sample size; r = correlation coefficient (or effect size) between age and recidivism; CIr = confidence interval about r.
p<0.05.

The use of the CI in meta-analysis is crucial. As Schmidt (1996) has pointed out, many people erroneously think that null hypothesis significance testing equally limits the probability of Type I (incorrectly concluding there is an effect) and Type II errors (incorrectly concluding there is no effect). Rather, what happens with significance testing is that, while Type I errors may be held at the 5% level (i.e., α = 0.05), no equivalent control of the Type II error rate can be assumed. The rate may commonly, in fact, be very high, often in the 50% range (Cohen, 1988), especially among studies with low power due to small sample sizes. Confidence estimates, on the other hand, provide a great advantage to cumulating knowledge in that they hold the overall error rate at 5% (Schmidt, 1996). That is, in only 5% of confidence intervals would one not expect to find the population parameter, or “true” effect size.

Thus, besides quantitatively demonstrating the degree of agreement there is within a given body of literature, meta-analysis also provides an estimate of certainty about a given effect. When the CI is very wide it tells the policy maker to be cautious, that conclusions about a given relationship should be regarded as tentative; more research is required. When the interval is very narrow, as in recent studies on the lack of effectiveness of time spent in prison and intermediate sanctions on recidivism (Gendreau, Goggin, & Fulton, 2000; Gendreau, Goggin, & Cullen, 1999), the policy-maker can place much more confidence in a reviewer's conclusions and, therefore, in their recommended course of action.

Returning to Table 25.1, we note the average effect size is r = 0.14, or a 14% reduction in recidivism with an associated CI of 0.05 to 0.23. Furthermore, following a useful procedure generated by Hedges and Olkin (1985), the effect sizes from studies can be weighted by sample size and the number of effect sizes involved which, in this case, produces a mean value of 0.10 with a CI bounded by 0.01 and 0.19.

Now, we have a more precise notion of the utility of the cognitive treatments in our example. Furthermore, we can examine moderators of interest within the database (i.e., offender risk or quality of program design), and repeat the procedures noted above to determine if these produce differential effects on recidivism. For example, in this case, risk level appears to be an important moderator (i.e., rhigh = 0.28 vs. rlow = -0.003), in that our hypothetical treatment results in a 28% decrease in recidivism among high risk offenders versus an approximate 1% increase in recidivism among the low risk group.

Recently, “new” statistics have appeared that are welcome additions to the meta-analyst's armamentarium. One group includes the fail-safe indicators (Gendreau, Smith, & Goggin, 1999; Orwin, 1987; Rosenthal, 1991) which assist in deter-mining the degree of confidence one can attribute to the mean effect of a given set of studies. That is, they specify how many additional studies averaging null effects, be they retrievable or unretrievable, would be required to counter the conclusions of a given meta-analysis.

We also favour the common language (CL) effect size indicator (McGraw & Wong, 1992). For example, in a forthcoming meta-analysis we report on which of two risk measures is the most useful for predicting offender recidivism, an issue dear to the hearts of many prison and parole officials. We found that, while both instruments were better than chance alone in predicting recidivism, one of the two produced significantly greater predictive validities (p<0.05). Clearly, a statement of statistical significance is not particularly helpful to the policy-maker or clinician in this regard. The CL indicator, on the other hand, is both an easily calculable and comprehensible statistic that can be of immediate utility to administrators. It provides them with a probabilistic statement of the relative performance of each of a pair of variables with outcome. For example, in the aforementioned meta-analysis, the CL indicated that one of the two risk measures produced higher correlations with recidivism 78% of the time (Gendreau, Goggin, & Smith, 1999). This is an example of the limitations inherent in significance testing and the benefits of somewhat more practical information in making informed decisions.

FUTURE OF META-ANALYSIS

Meta-analysis has now become the review method of choice and has led to significant advances in knowledge on issues in a variety of fields (Hunt, 1997) including criminal justice (Gendreau, Goggin, & Smith, 2000). Indeed, in a quantitative comparison of the results of narrative versus meta-analytic reviews, Beaman (1991) found that meta-analyses out-performed narrative reviews by about 50% on average in their description of myriad study characteristics including the nature and conditions of the literature under review, the direction and magnitude of the effect size in question, as well as the relationship between the results and specific moderators.

Narrative summaries also tend to underestimate the magnitude of an effect (Cooper & Rosenthal, 1980). This may be due to the fact that those conducting such reviews are unduly cautious in their conclusions, lacking as they do the collaborative support of exact quantitative effect size estimates.

Admittedly, meta-analytic procedures are no panacea. Anyone who has conducted one knows full well that the meta-analyst faces a number of complex, subjective decisions regarding study coding and type of analysis. Also, there are some meta-analytic issues that, as Cooper (1997, p. 179) has noted, “often baffle even sophisticated data analysts.” The meta-analytic review is some-times portrayed as the definitive answer but, in our experience, after having meta-analyzed several correctional literatures, we have concluded that the studies in some of these literatures were so lacking in essential details that additional primary research is still needed (Gendreau et al., 1996; Gendreau, Goggin, & Smith, 1999) before one could furnish clinicians and policy-makers with more definitive conclusions. In addition, there are literatures that have so few quantitative studies that a narrative review must suffice for the moment.

Granted the above caveats, however, in our view, there is no avoiding the use of quantitative research syntheses to foster much needed respect for evidence in the field of corrections. As noted elsewhere (Gendreau, 1999), we would consider it a victory if even 20% to 40% of our policies were derived from meta-analytic approaches.


1 University of New Brunswick, Centre for Criminal Justice Studies

2 When the first author began working in corrections 40 years ago, the literature was minuscule by today's standards (Gendreau, 1996b). There were less than a handful of requisite books or journals one need consult to remain informed.

3 A marginal increase of only five to ten offenders in each of the five samples, while maintaining the same effect sizes, produces markedly different results: each of the correlations is now statistically significant although the mean effect size remains unchanged (r = 0.30).

4 We are mindful of the fact that some researchers, as do we, choose to weight studies by sample size. We argue elsewhere that this is not necessarily axiomatic procedure; studies with large sample sizes may have less methodological quality (Gendreau, Goggin, & Smith, 2000, p. 56).

5 A more detailed discussion can be found in the reader-friendly, how-to “cookbooks” on meta-analysis by Durlak (1995) and Wolf (1986).

6 Notwithstanding the need for standardized policies, exceptions to an overarching policy can easily be made if circumstances warrant doing so (i.e., a given risk measure is found to be superior among a small sub-sample of offenders or for a particular type of outcome).

7 Important study characteristics that are routinely coded include: study context -- country of study, author's discipline, source of funding, and year and type of publication; sample characteristics -- age, gender, race, and offender risk level; variables specific to treatment studies -- type of treatment, treatment dosage, “who” administers the treatment, treatment setting, program sponsorship, age of program, theoretical foundation of program, and role of evaluator; method -- comparability of treatment-control groups, rate of attrition, type of outcome, and length of follow-up.


REFERENCES

Andrews, D. A., & Bonta, J. (1998). The psychology of criminal conduct, 2nd edition. Cincinnati, OH: Anderson Press.

Beaman, A. L. (1991). An empirical comparison of meta-analytic and traditional reviews. Personality and Social Psychology Bulletin, 17, 252-257.

Campbell, D. T. (1969). Reforms as experiments. American Psychologist, 24, 409-428.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum.

Cooper, H. (1997). Some finer points in meta-analysis. In M. Hunt (Ed.), How science takes stock: The story of meta-analysis (pp. 169-18 1). New York, NY: Russell Sage Foundation.

Cooper, H. M., & Rosenthal, R. (1980). Statistical versus traditional procedures for summarizing research findings. Psychological Bulletin, 87, 442-449.

Cullen, F. T., & Gendreau, P. (1989). The effectiveness of correctional treatment: Reconsidering the “nothing works” debate. In L. Goodstein & D. L. MacKenzie (Eds.), The American prison: Issues in research and policy (pp. 23-44). New York NY: Plenum.

Cullen, F. T., & Gendreau, P. (2000). Assessing correctional rehabilitation: Policy, practice, and prospects. In J. Horney (Ed.), National Institute of Justice criminal justice 2000: Changes in decision making and discretion in the criminal justice system. (pp. 109-175). Washington, DC: Department of Justice, National Institute of Justice.

Durlak, J. A. (1995). Understanding meta-analysis. In L. G. Grimm & P. R. Yarnold (Eds.), Reading and understanding multivariate statistics (pp. 219-252). Washington, DC: American Psychological Association.

Fulford, R. (1995, October). Regarding Henry. Report on Business Magazine, 91, 67-74.

Gendreau, P. (1996a). The principles of effective intervention with offenders. In A. T. Harland (Ed.), Choosing correctional interventions that work: Defining the demand and evaluating the supply (pp. 117-130). Newbury Park, CA: Sage.

Gendreau, P. (1996b). Offender rehabilitation: What we know and what needs to be done. Criminal Justice and Behavior, 23, 144-161.

Gendreau, P. (1999). Rational policies for reforming offenders. The ICCA Journal of Community Corrections, 9, 16-20.

Gendreau, P., Goggin, C., & Cullen, F. (1999). The effects of prison sentences on recidivism. Ottawa, ON: Solicitor General Canada.

Gendreau, P., Goggin, C., & Fulton, B. (2000). Intensive supervision in probation and parole. In C. R. Hollin (Ed.), Handbook of offender assessment and treatment (pp. 195-204). Chichester, UK: John Wiley.

Gendreau, P., Goggin, C., & Smith, P. (1999, May). Predicting recidivism: LSI-R vs. PCL-R. Canadian Psychology Abstracts, p. 40, 2a.

Gendreau, P., Goggin, C., & Smith, P. (2000). Generating rational correctional policies: An introduction to advances in cumulating knowledge. Corrections Management Quarterly, 4, 52-60.

Gendreau, P., Little, T., & Goggin, C. (1996). A meta-analysis of adult offender recidivism: What works! Criminology, 34, 575-607.

Gendreau, P., & Ross, R. R. (1979). Effective correctional treatment: Bibliotherapy for cynics. Crime and Delinquency, 25, 463-489.

Gendreau, P., & Ross, R. R. (1987). Revivification of rehabilitation: Evidence from 1980s. Justice Quarterly, 4, 349-407.

Gendreau, P., Smith, P., & Goggin, C. (1999). Catching up is hard to do: A fail-safe statistic for policy-makers. Unpublished manuscript, Centre for Criminal Justice Studies, University of New Brunswick at Saint John, NB.

Glass, G., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills, CA: Sage.

Granatstein, J. L. (1982). The Ottawa men: The civil service mandarins, 1937-1957. Toronto, ON: Oxford Press.

Hedges, L. V. (1987). How hard is hard science, how soft is soft science: The empirical cumulations of research. American Psychologist, 42, 443-455.

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. San Diego, CA: Academic Press.

Hunt, M. (1997). How science takes stock: The story of meta-analysis. New York, NY: Russell Sage Foundation.

Hunter, J. E., & Schmidt, F. L. (1996). Cumulative research knowledge and social policy formulation: The critical role of meta-analysis. Psychology, Public Policy, and Law, 2, 324-347.

Krugman, P. (1994). Peddling prosperity: Economic sense and nonsense in the age of diminished expectations. New York, NY: Norton.

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111, 361-365.

Orwin, R. G. (1987). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics, 8, 157-159.

Osbaldeston, G. (1989). Keeping deputy ministers accountable. Whitby, ON: McGraw-Hill.

Redondo, S., Sanchez-Meca, J., & Garrido, V. (1999). The influence of treatment programmes on the recidivism of juvenile and adult offenders: A European meta-analytic review. Psychology, Crime and Law, 5, 251-278.

Rosenthal, R. (1991). Meta-analytic procedures for social research. Beverly Hills, CA: Sage.

Rosenthal, R. (1995). Writing meta-analytic reviews. Psychological Bulletin, 8, 183-192.

Savoie, D. J. (1999). Governing from the centre: The concentration of power in Canadian politics. Toronto, ON: University of Toronto Press.

Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 115-129.

Shadish, W. R. (1996). Meta-analysis and the exploration of causal mediating processes: A primer of examples, methods, and issues. Psychological Bulletin,1, 47-65.

Starobin, P. (1997, July). Word warriors. TheWashingtonian, 32, 48-51& 101-103.

Wolf, F. M. (1986). Meta-analysis: Quantitative methods for research synthesis. Newbury Park, CA: Sage.

--------------------

Previous PageTop Of Page Table Of ContentsNext Page