Program Evaluation
Policy Studies 195.611
Fall 2011
Larry L. Orr (301) 467-1234
This course will introduce the student to the fundamental principles and practices involved in the design, implementation, and analysis of program evaluations. We will study both the evaluation of ongoing programs and tests of new interventions that are being considered for broader adoption. The course begins by considering why it is so difficult to tell whether programs are “working” – i.e., achieving their objectives in terms of effects on their participants or beneficiaries. We then review the basic statistical principles involved in designing an evaluation based on random assignment – the simplest, clearest, and most reliable evaluation model – as well as the principal non-experimental methods that are available when random assignment is not feasible, along with the assumptions they require if they are to provide unbiased estimates of program effects. (The course assumes some grounding in quantitative methods, but the statistical concepts involved will be presented as part of the course.) Students will then study the procedures involved in implementing an evaluation in the field, including some potential pitfalls and ways of dealing with them, and procedures for collecting and analyzing evaluation data. The course concludes with overviews of benefit-cost analysis, process analysis, and the role of evaluation results in the policy process. The principal text for the course, Social Experiments: Evaluating Public Programs with Experimental Methods, will be supplemented with readings from the literature and actual evaluations. All required readings except the text will be available in electronic form.
Program Evaluation
Policy Studies 195.611
Course Schedule
All classes are from 4:00 – 6:20 p.m.
100 Shaffer Hall
Date Lecture Topics
Aug. 30 1. Introduction to Course
The Fundamental Problem of Evaluation
Sept. 5 2. Defining and Interpreting the Treatment/Control Contrast
Alternative Designs and the Policy Questions They Address
Guarding Against Chance Differences
Sept. 12 3. Basic Evaluation Design Issues
Designing an Experimental Evaluation
Sept. 19 4. Adjusting the Impact Estimate for Nonparticipation
Sample Size and the Power of the Design
Nonparticipation and Power
Sept. 26 5. Nonexperimental Methods I: Taking account of measured differences
Oct. 3 6. Nonexperimental Methods II: Taking account of unmeasured differences
Oct. 17 7. Allocation of the Sample
Site Selection
Implementation of the Experiment
Oct. 24 8. Implementing Random Assignment
What Can Go Wrong – and What to Do When it Does
Nov. 1 9. Data Collection
Nov. 8 10. Analyzing the Data
Interpreting the Impact Estimates
Nov. 15 11. Cost‑Benefit Analysis
Nov. 22 12. Process Analysis
Nov. 29 13. Use of Evaluation Results in the Policy Process
Program Evaluation
Policy Studies 195.611
Provisional Reading List
Note: All readings will be available electronically except the text and other books.
Numbers refer to the corresponding lecture.
Required Text: Larry L. Orr. Social Experiments: Evaluating Public Programs with Experimental Methods. Thousand Oaks, CA: Sage Publications.
1. The Fundamental Problem of Evaluation
Orr, Social Experiments, Chapter 1: Rationale and History of Social Experiments
Angrist, Joshua D. (2004), “American Education Research Changes Tack”, Oxford Review of Economic Policy, Vol. 20, No. 2. A very readable account of how and why the U.S. Dept. of Education shifted its research almost entirely to randomized trials, and why randomized trials are considered the “gold standard” of evaluation research.
“Study Disputes Wait-and-See Approach to Prostate Cancer”, Washington Post, Dec. 13, 2006. Description of a study that is (unfortunately) typical of much epidemiological evaluation of medical procedures. Think about whether you believe the findings – especially in light of the last paragraph.
Orr, Larry L., Stephen H. Bell, and Jacob A. Klerman. “Designing Reliable Impact Evaluations”, in The Workforce Investment Act: Implementation Experiences and Evaluation Findings, Douglas J. Besharov and Phoebe Cottingham, eds. (Kalamazoo, MI: W. E. Upjohn Institute for Employment Research, 2011). Lessons from the American experience, in a volume commissioned to help the European Commission evaluate its workforce development programs.
A good book to read in your copious spare time:
Marks, Harry M. The Progress of Experiment: Science and Therapeutic Reform in the United States, 1900-1990 (Cambridge University Press, 1997). How medicine came to adopt randomized trials as the method of choice for testing new treatments.
2. Defining and Interpreting the Treatment/Control Contrast
Guarding Against Chance Differences
Alternative Designs and the Policy Questions They Address
Orr, Social Experiments, Chapter 2: Basic Concepts and Principles
Read the following one-page overviews of economic development evaluations conducted by the MIT Poverty Action Lab; this is an area that was essentially devoid of rigorous evaluation until a few years ago:
· http://www.povertyactionlab.org/evaluation/police-performance-and-public-perception-rajasthan-india
· http://www.povertyactionlab.org/evaluation/measuring-impact-microfinance-hyderabad-india
For more on the Poverty Action Lab, see: http://www.povertyactionlab.org/about-j-pal
For a long list of similar projects, see: http://www.povertyactionlab.org/duflo
Spare time reading:
Salsburg, David. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century (Henry Holt, 2001). An account for the layperson of the development of modern statistics, including experimental analysis, told through a series of bio-sketches of great statisticians. Biography, not statistics. Fascinating reading!
3. Basic Evaluation Design Issues
Designing an Experimental Evaluation
Orr, Social Experiments, Chapter 3: Alternative Random Assignment Models.
Esther Duflo, “Evaluating the Impact of Development Aid Programs: The Role of Randomized Evaluations”, in Development Aid: Why and How? Towards strategies for effectiveness. Proceedings of the AFD-EUDN Conference, 2004, pp. 207-244. Some good examples of randomized evaluations in developing countries, as well as lessons learned.
4. Adjusting the Impact Estimate for Nonparticipation
Sample Size and the Power of the Design
Nonparticipation and Power
Orr, Social Experiments, pp. 62-64 and 107-121.
Optional:
Bloom, Howard S. The Core Analytics of Randomized Experiments for Social Research, MDRC Working Papers on Research Methodology, August 2006. Covers much of the territory discussed today plus group RA. Uses different terminology and some alternative formulas, but is concise and well-written.
5. Nonexperimental Methods I: Taking Account of Measured Differences
Gary Burtless, "The Case for Randomized Field Trials in Economic and Policy Research," Journal of Economic Perspectives 9 (1995): 63‑84.
James J. Heckman and Jeffrey A. Smith, "Assessing the Case for Social Experiments," Journal of Economic Perspectives 9 (1995): 85‑110.
Jeffrey A. Smith and Petra E. Todd (2005). “Does Matching overcome LaLonde’s Critique of Nonexperimental Estimators?” Journal of Econometrics 125, 305-353. The reply by Dehejia and the rejoinder by Smith and Todd 355-375 are recommended.
Howard Bloom, Charles Michalopoulos, Carolyn Hill, Ying Lei (2002). Can Nonexperimental Comparison Group Methods Match the Findings from a Random Assignment Evaluation of Mandatory Welfare‑to‑Work Program? New York: Manpower Demonstration Research Corporation. Chapters 1, 2, and 4 required; rest is optional.
Elizabeth Ty Wilde and Robinson Hollister. “How Close is Close Enough? Evaluating Propensity Score Matching Using Data from a Class Size Reduction Experiment.” Journal of Policy Analysis and Management. Summer 2007. pp. 455-477. (Skim, but note conclusions)
6. Nonexperimental Methods II: Taking Account of Unmeasured Differences
Joshua D. Angrist and Alan B. Krueger, “Instrumental Variables and the Search for Identification”, Journal of Economic Perspectives, Fall 2001. Excellent overview of the case for IV by two of its foremost proponents.
Shawn Bushway, Brian D. Johnson, and Lee Ann Slocum, “Is the Magic Still There? The Use of the Heckman Two-Step Correction for Selection Bias in Criminology”, Journal of Quantitative Criminology, June 2007. A critical, but balanced, review of the use and misuse of the Heckman correction in criminology.
Steven Glazerman, Dan Levy, and David Myers, “Nonexperimental versus Experimental Estimates of Earnings Impacts”, Annals of the Academy of Political and Social Sciences,2003. Analysis of 12 “design replication studies”, in which researchers attempted to replicate experimental estimates with nonexperimental methods, using data from the same study.
7. Allocation of the Sample
Site Selection
Implementation of the Experiment
Orr, Social Experiments, Chapter 5: Implementation and Data Collection, pp. 139-153.
National Evaluation of Youth Corps: Training Manual for Participating Programs, by Carrie Markovitz, Ryoko Yamaguchi, and Rebecca Zarch. Good example of a program staff training manual.
National Evaluation of Youth Corps: A Facilitated Discussion, Attendee Questions. Summary of local program staff questions and research team answers from a training session with youth corps staff. Provides a good sense of local program staff concerns.
8. What Can Go Wrong – and What to Do When it Does
Orr, Social Experiments, Chapter 5: Implementation and Data Collection, pp. 154-168.
Proposal to Conduct a Random Assignment Evaluation of Youth Corps, August 29, 2005. This winning proposal is a good, if unusually short, example of a full design for an experimental evaluation. Shows how site selection/recruiting and data collection fit into the overall design.
9. Data Collection
Orr, Social Experiments, Chapter 5: Implementation and Data Collection, pp. 168-182.
Robert Kornfeld and Howard S. Bloom, “Measuring Program Impacts on Earnings and Employment: Do Unemployment Insurance Wage Reports from Employers Agree with Surveys of Individuals?”, Journal of Labor Economics, Vol. 17, No. 1 (Jan., 1999), pp. 168-197.
10. Analyzing the Data and Interpreting the Impact Estimates
Orr, Social Experiments, Chapter 6: Analysis.
Peter Z. Schochet. Guidelines for Multiple Testing in Impact Evaluations. Institute of Evaluation Sciences Technical Methods Report. May 2008
11. Benefit-Cost Analysis
Bloom, Howard S., Larry L. Orr, Stephen H. Bell, George Cave, Fred Doolittle, Winston Lin, and Johannes M. Bos. 1997. "The Benefits and Costs of JTPA Title II-A Programs: Key Findings from the National JTPA Study", Journal of Human Resources. Summer. pp. 549-576.
Bell, Stephen H., and Larry L. Orr. "Is Subsidized Employment Cost-Effective for Welfare Recipients? Empirical Evidence from Seven State Demonstrations," Journal of Human Resources, Winter 1994. pp. 42-61.
Burtless, Gary, and Larry L. Orr (1986), “Are classical experiments needed for manpower policy?” Journal of Human Resources, Vol. 21, No. 4 (Autumn), pp. 626-639. In which we estimate the social value of an experimental evaluation of JTPA to be approximately $.5 billion. When the evaluation was conducted, that estimate turned out to be too low by a factor of about 5.
12. Process Analysis
Werner, Alan. A Guide to Implementation Research (Urban Institute Press, 2004).
Leonard Mlodinow. 2008. The Drunkard’s Walk, New York: Pantheon Books. Chapter 9. (Relevant to most qualitative analysis.)
13. Use of Evaluation Results in the Policy Process
Orr, Social Experiments, Chapter 7: Social Experimentation and the Policy Process.
Spare time reading:
Greenberg, David, et al. Social Experimentation and Public Policymaking (Urban Institute Press, 2003). How experimental evidence has been used in policy – or not. Some good stories.
Double-click to edit text, or drag to move.