Correctional Service Canada
Symbol of the Government of Canada

Common menu bar links

Compendium 2000 on Effective Correctional Programming

Warning This Web page has been archived on the Web.


Program Evaluation: Intermediate Measures of treatment Success


The past decade has seen significant gains in our understanding regarding correctional programming. This has subsequently led to a substantial increase in the range and number of correctional programs provided to offenders, both those incarcerated in prisons and those under community supervision. The most prominent theoretical orientation of correctional programs, perhaps because of their demonstrated efficacy, has been cognitive-behavioural in orientation (Andrews, Dowden, & Gendreau, 2000). Further, this proliferation of correctional programming has been driven by concerns by administrators of increased inmate populations and the perceived need to provide better risk management (Motiuk & Serin, 1998). This chapter addresses the selection of appropriate measures in order to determine the effectiveness of a correctional program.

Regarding program evaluation, there are a number of key elements that contribute to the overall effectiveness of a particular intervention. Van Voorhis and her colleagues have outlined many of these elements to assist practitioners and policymakers to make informed decisions about interventions for offenders (Van Voorhis, Cullen, & Applegate, 1995). In their review of violent offender programs, they illustrate the importance of such factors as the consideration of program climate and support, the development of offender selection criteria based on the treatment targets of the intervention, the assurance of program integrity, and the determination of measures of success. They provide an extended discussion of each of these factors, but this chapter will limit its focus to the factor they termed intermediate measures of program success. It must, however, be emphasized that quality of program delivery and implementation concerns such as staff selection and training are also critical to program evaluation (Serin & Preston, 2001).

The coining of the term intermediate measures is an effort by researchers to better link program targets, program objectives, and outcome measures (Van Voorhis et al., 1995). In terms of measuring the impact of a particular intervention, it is recommended that assessment be multi-method and multi-modal (Palmer, 1996). Further, the assessment protocol should not rely solely upon offender self-report because of the numerous difficulties inherent in this approach (Serin & Preston, 2001). Related to this, then, attempts should be made to control for social desirability. Alternative forms of assessment include structured inter-views, vignette (in situ) assessments, and behavioural observations. Staff ratings, particularly of motivation and treatment readiness, and treatment performance should also be included (Kennedy & Serin, 1999). Lastly, a literature review, consultation with colleagues and researchers, and the availability of appropriate norms should guide the final selection of assessment tools.


Pre- and post-treatment testing is one aspect related to assessment of target problems (Goldstein & Keller, 1987; Serin, 1995). Offenders should complete a comprehensive, multi-method assessment battery before and after a planned intervention. The assessment battery should assess domains that are reasonable treatment tar-gets and that are determined on an a priori basis. This could include literature reviews, theoretical models, and demonstrated need. This will allow for the identification of individual treatment needs and provide a basis from which to gauge treatment gain. It should then be possible to link specific intervention strategies to particular treatment needs. Ideally, these treatment needs will also be determined on the basis of their relationship to criminal behaviour (Andrews & Bonta, 1998). Importantly, the assessment of criminogenic needs may include structured ratings (LSI-R, Andrews & Bonta, 1995; Motiuk, 1997), self-reports (Serin & Mailloux, 2001), and functional analyses (McDougall, Clark, & Fisher, 1994).

While pre- and post-tests provide an indication of change as a condition of treatment, process measures can help determine which aspects of the program are responsible for producing change. In effect, process measures assess the impact of program content on knowledge and skills acquisition (Marques, Day, Nelson, & West, 1994). Obviously, process measures must be specific to the content of each module of the program, and offenders should complete them before and after the delivery of each particular module. Interim and outcome evaluations of the program can then examine the extent to which the process measures are useful in measuring change and in predicting outcome.


Intermediate measures of treatment success should include behavioural ratings in addition to the more common offender self-reports. An additional means of assessing treatment gain to pre- and post-tests (and change scores), and to process measures, are systematic ratings of behaviour. Thus, these ratings can identify at which point in the program gains became evident by profiling change over time. Behaviours such as attendance, participation, attentiveness, comprehension, and skill implementation are just some to consider. To maximize reliability, a Likert scale could be used with explicit behavioural anchors. As well, staff could complete these by consensus such that each rating reflects an aver-age of two raters. Subsequent analyses can then determine the relationship between staff ratings and offender reports and behaviour change, as well as the extent to which each predict outcome.

A clear description of the treatment targets and program objectives are critical to the development of intermediate measures of success. Essentially there are three distinct questions. Does change occur in the areas targeted by the correctional intervention? Is this change in the predicted or hypothesized direction? Are these changes related to other indices of treatment performance? A related question is whether these changes correlate with other dependent variables such as recidivism. In this context the first two questions answer whether change has occurred, but the prediction of recidivism is an investigation of the generalization of these treatment gains to other situations.

In terms of evaluating treatment performance it may be helpful to consider intermediate objectives (see Table 24.1). For instance, for violent offenders it is reasonable to investigate whether “successful” program participation yields reductions in institutional infractions in terms of fights or arguments. For sex offenders, intermediate objectives might be decreases in inappropriate comments to women staff or reductions in contacts with identified victim types (e.g., viewing children in catalogues and on television). For substance abusers, an intermediate objective might be reductions in institutional infractions relating to possession or use of illicit substances. For each of these objectives, it is also possible to consider reductions in either frequency or severity, relative to some prescribed period of time prior to the initiation of treatment. Other inter-mediate objectives could include reductions in the number of days spent in segregation for disciplinary reasons before and after the program. Also, tabulating rates of granting conditional release, rates of referral for special conditions or residential requirements might be instructive. Depending of the nature of the program and the needs of the offenders, examining employment rates and subsequent program participation may be helpful. Further, consideration of transfers to reduced security (or increased security in the case of program failures) might be a manner of determining program success prior to collecting recidivism data. Finally, it is important to consider refusal rates, program completion rates, and reasons for non-completion. High refusal rates and high program attrition will ultimately limit the generalizability of the program and raise legitimate questions regarding its efficacy.

TABLE 24.1 Intermediate Indices of Program Effectiveness

Type of Offender (Primary need)

Offence Specific

Intermediate Outcome Measures

  1. Reduced institutional charges
  2. Fewer verbal confrontations with staff
  1. Transfers to reduced security
  2. Program performance and compliance
  3. Program completion
  1. Reduced inappropriate interactions with staff
  2. Decreased victim interest (viewing children on T.V.)
  1. Positive release decisions
  2. Fewer days served post-treatment
  3. Decreased evidence of sexually predatory behaviour against other offenders
Substance Abuser
  1. Fewer institutional incidents relating to drugs and debts (possession/under the influence)
  2. Negative urinalysis testing
  1. Post-treatment program compliance
  2. Positive urinalysis for less addictive drugs
  3. Changed peer associations



Another form of evaluating intermediate measures of program success relates to consumer satisfaction. Surveys that consider the content of the program, its duration and other time issues, the process by which the program was delivered, and the skill components would all be important. An indication of the best and worst aspects of a program is also sometimes illuminating. It is worth, however, noting that often such surveys simply pro-vide a forum for offenders to try and garner support by extolling the merits of a particular program and its staff. Therefore, in addition to having offenders complete a confidential post-treatment evaluation of staff and the program, it is important to consider other consumers. Accordingly, this could include conducting a survey to determine the utility of post-treatment reports to various decision-makers.


The literature regarding the predictive validity of change scores is relatively ambiguous. In the area of sex offender treatment, the best predictors of sexual recidivism appear to be static risk factors and pre-treatment phallometric indices of sexual deviance (Hanson & Bussière, 1998; Quinsey, Harris, Rice, & Cormier, 1998). Changes in phallometric levels appear less predictive than baseline levels. Also, changes on questionnaires relating to knowledge of relapse prevention principles in sex offender treatment appears unrelated to outcome, however, for some groups of sex offenders, gains in skills may be related to outcome (Marques, personal communication, December 1999). In other areas, such as violent offenders, changes are often contaminated with social desirability or offenders' self-reports on psychological scales reflect increased problems at post-treatment (Serin & Kuriychuk, 1994). Anecdotally, it appears that such anomalous findings have been explained by proposing that the intervention has had an effect in that the offenders now recognize their behaviour to be problematic. Typically, most studies report changes in scores between pre- and post-treatment testing, but often the relationship to social desirability is alarmingly high (Blanchette, Robinson, Alksnis, & Serin, 1998). Importantly, these changes do not consistently relate to reductions in recidivism, necessitating longitudinal studies. In the area of substance abuse there is some indication that change scores are related to improved outcome (Reintegration Programs, 1999).

This very brief overview of intermediate measures of treatment success highlights four specific program evaluation issues. Firstly, the necessity to measure social desirability as part of the assessment battery. Secondly, the need to ensure that the measures are theoretically and empirically related to treatment targets, and program objectives (Van Voorhis et al., 1995). Thirdly, the need to distinguish between knowledge and skills. The latter may be best assessed by performance-based measures that are situational-specific such as hypothetical vignettes (Dodge & Frame, 1982; Serin, 1991). Fourthly, the potential advantage in distinguishing between change scores and threshold scores in the prediction of recidivism. Change scores reflect the degree of change on a test between evaluations completed prior to and after treatment. It is possible that an offender may have very low pre-treatment scores because of high needs or low skills. Further, they may make significant gains and have marked change scores, but still fall well below the levels attained by other offenders. It is also possible that in order for there to be sustained behaviour change across different situations, greater knowledge or skills are required. That is, a higher threshold score is necessary and it is this final score, not the change score that may prove to be a better predictor of outcome.


In deference to the risk principle that states higher risk offenders required higher intensity intervention (Andrews & Bonta, 1998), most programs consider risk measures within the program delivery procedure. In some cases this is part of the selection criteria (Reintegration Programs, 1999) or the risk estimate is used for post-treatment comparisons regarding differential treatment response (Dowden, Blanchette, & Serin, 1999). Since most of the popular risk assessment strategies reflect static factors, an important issue is how to best incorporate treatment change into re-appraisals of risk and post-treatment risk management strategies (Serin 1998).


Presently the correctional program evaluation literature is principally concerned with determining treatment effectiveness. As noted earlier, restricting definitions of effectiveness to only the issue of recidivism is considered potentially limiting.2 Various intermediate indices of program success exist and should be investigated. Equally limiting is the apparent belief that offenders are a homogeneous group who will respond similarly to the same program experience. This belief is reflected in the practice of choosing to investigate treatment outcome between groups (treated versus untreated; treated versus dropouts), although this is in contrast to the literature regarding treatment responsivity (Bonta, 1995; Kennedy & Serin, 1999). Even in the area of sex offenders where distinct groups exist because of victim characteristics, programs typically include different types of offenders within the same program and collapse across groups for the purposes of program evaluation. Equally disconcerting is the tendency to develop a program for a particular target, for example, violence, and then fail to consider that the targets within a sample of violent offenders may differ (Serin & Preston, 2001). In fact, it may be that failing to match the offender with the appropriate intervention may actually result in treatment failures (Rice, Harris, & Cormier, 1992; Serin & Preston, 2001). Paying closer attention to the development of treatment targets, intermediate measures and program objectives (Serin & Preston, in press; Van Voorhis et al., 1995) might assist clinicians to more carefully consider treatment responsivity factors (Kennedy & Serin, 1999).


The final issue to address is the reliance on recidivism as the raison d'être for correctional program. It has been argued that recidivism may not be the preferred index of treatment effectiveness (Elliot, 1980). Specific to recidivism, there are several considerations. For instance, the length of follow-up time will effect base rates. Also, there is debate regarding the “best” definition (Phipps, Korinek, Aos, & Lieb, 1999). Alarmingly, this absence of a standard makes comparisons across programs problematic. For violent offenders it seems most probable that reductions in violent reoffending would be viewed as the most desirable outcome, yet even this could be debated because violence defined by conviction is a poor proxy to actual behaviour. Also, if a violent offender recommits a violent crime, but relative to their history it involves a less serious incident, less victim injury or longer time to reoffence, is this a clear indication of program failure? Defining outcome only dichotomously, then, limits our understanding about program effectiveness. The use of survival analyses, consideration of prediction analyses as well as comparisons of group differences, and the relative utility of change scores and thresholds in determining program effectiveness are all recommended. Lastly, the consideration of intermediate measures should contribute to increased fidelity of determinations of program effectiveness.

1 Correctional Service of Canada

2 Sex offender therapists appear not to tolerate “lapses” because of the victimization issues.


Andrews, D. A., & Bonta, J. (1995). The Level of Service Inventory-Revised. Toronto, ON: Multi-Health Systems.

Andrews, D. A., & Bonta, J. (1998). The psychology of criminal conduct. 2nd Edition, Cincinnati, OH: Anderson Publishing.

Andrews, D. A., Dowden, C., & Gendreau, P. (2000). Clinically relevant and psychologically informed approaches to reduced reoffending: A meta-analytic study of human service, risk, need, responsivity and other concerns in a justice context. Manuscript submitted for publication.

Blanchette, K., Robinson, D., Alksnis, C., & Serin, R. C. (1998). Assessing treatment change among family violent offenders: Reliability, and validity of a family violence treatment assessment battery. Research Report R-72. Ottawa, ON: Correctional Service of Canada.

Bonta, J. (1995). The responsivity principle and offender rehabilitation. Forum on Corrections Research, 7(3), 34-37.

Dodge, K. A., & Frame, C. L. (1982). Social cognitive biases and deficits in aggressive boys. Child Development, 53, 620-635.

Dowden, C., Blanchette, K., & Serin, R. C. (1999). Anger management programming for federal male inmates: An effective intervention. Research Report R-82. Ottawa, ON: Correctional Service of Canada.

Elliott, D. S. (1980). Recurring issues in the evaluation of delinquency prevention and treatment programs. In D. Schichor & D. Kelly (Eds.), Critical issues in juvenile delinquency, (pp. 237-262). Lexington, MA: D.C. Heath.

Goldstein, A. P., & Keller, H. (1987). Aggressive behavior: Assessment and intervention. New York, NY: Pergamon.

Hanson, R. K., & Bussière, M. T. (1998). Predicting relapse: A meta-analysis of sexual offender recidivism studies. Journal of Consulting and Clinical Psychology, 66, 348-362.

Kennedy, S., & Serin, R. (1999). Examining offender readiness to change and the impact on treatment outcome. In P. M. Harris (Ed.), Research to results: Effective community corrections, (pp. 215-230). Lanham, MD: American Correctional Association.

Marques, J. K. (1999). Personal communication, December.

Marques, J. K., Day, D. M., Nelson, C., & West, M. A. (1994). Effects of cognitive-behavioral treatment on sex offender recidivism: Preliminary results of a longitudinal study. Criminal Justice and Behavior, 21, 28-54.

McDougall, C., Clark, D., & Fisher, M. (1994). The assessment of violent behaviour. In M. McMurran & J. Hodge (Eds), The assessment of criminal behaviour of clients in secure settings, (pp. 68-93). London, UK: Jessica Kingsley Publishers.

Motiuk, L. L. (1997). Classification for correctional programming: The Offender Intake Assessment (OIA) process. Forum on Corrections Research, 9(1), 18-22.

Motiuk, L. L., & Serin, R. C. (1998). Situating risk assessment in the reintegration potential framework. Forum on Corrections Research, 10(1), 19-22.

Palmer, T. (1996). Programmatic and non-programmatic aspects of successful interventions. In A.T. Harland (Ed.), Choosing correctional options that work: Defining the demand and evaluating the supply (pp. 131-182). Thousand Oaks, CA: Sage.

Phipps, P., Korinek, K., Aos, S., & Lieb, R. (1999). Research findings on adult corrections programs: A review 1999. Olympia, WA: Washington State Institute for Public Policy.

Quinsey, V. L., Harris, G. T., Rice, M. E., & Cormier, C. A. (1998). Violent offenders: Appraising and managing risk. Washington, DC: American Psychological Association.

Reintegration Programs, 1999. An outcome evaluation of CSC substance abuse programs: OSAPP, ALTO and Choices. Ottawa, ON: Public Works and Government Services Canada.

Rice, M. E., Harris, G. T., & Cormier, C. A. (1992). An evaluation of a maximum-security therapeutic community for psychopaths and other mentally disordered offenders. Law and Human Behavior, 16, 399-412.

Serin, R. C. (1991). Psychopathy and violence in criminals. Journal of Interpersonal Violence, 6, 423-431.

Serin, R. C. (1995). Psychological intervention in corrections. In T. A. Leis, L. L. Motiuk, & J. R. P. Ogloff (Eds.), Forensic psychology: Policy and practice in corrections, (pp. 36-40). Ottawa, ON: Ministry of Supply and Services Canada.

Serin, R. C. (1998). Treatment responsivity, intervention, and reintegration: A conceptual model. Forum on Corrections Research, 10(1), 29-32.

Serin, R. C., & Kuriychuk, M. (1994). Social and cognitive processing deficits in violent offenders: Implications for treatment. International Journal of Law and Psychiatry, 17, 431-441.

Serin, R. C., & Mailloux, D. L. (2001). Development of a Reliable Self-Report Instrument for the Assessment of Criminogenic Needs. Research Report R-96. Ottawa, ON: Correctional Service of Canada.

Serin, R. C., & Preston, D. L. (2001). Designing, implementing and managing treatment programs for violent offenders. In G. A. Bernfeld, D. P. Farrington, & A. W. Leschied (Eds.), Offender rehabilitation in practice: Implementing and evaluating effective programs, (pp. 205-221), West Sussex, UK: John Wiley & Sons.

Serin, R. C., & Preston, D. L. (2001). Managing and treating violent offenders. In J. B. Ashford, B. D. Sales, & W. Reid (Eds.), Treating adult and juvenile offenders with special needs, (pp. 249-272). Washington, DC: American Psychological Association.

Van Voorhis, P., Cullen, F. T., & Applegate, B. (1995). Evaluating interventions with violent offenders: A guide for practitioners and policymakers. Federal Probation, 59, 17-28.


Previous PageTop Of Page Table Of ContentsNext Page