All content on this blog is original work produced by Allison Primack. Do not republish or print without permission.

Sunday, March 25, 2012

Impact Evaluation Critique

Here is a paper I wrote right before spring break for my program evaluation class. The assignment was to analyze an existing impact evaluation, and critique what was wrong with it. The formatting got a little messed up in the copying/pasting of it, but you can still get the idea of what I wrote!


Critique of The Boston Gun Project: Impact Evaluation Findings

Description of the Focus and Findings
            This is a critique of the Boston Gun Project, “a problem-oriented policing initiative expressly aimed at taking on a serious, large-scale crime problem – homicide victimization among young people in Boston.[1]” This project, funded by the National Institute of Justice and Harvard University’s John F. Kennedy School of Government, conducted quantitative and qualitative research to find the root of the youth homicide issue in Boston, created and implemented an interagency intervention approach, and evaluated the programs impact, particularly in the near term.
To achieve the interagency strategy, the following organizations committed their involvement to the Boston Gun Project “working group”: the Boston Police Department, the Massachusetts departments of probation and parole, the office of the Suffolk County District Attorney, the office of the United States Attorney, the Bureau of Alcohol, Tobacco, and Firearms, the Massachusetts Department of Youth Services (juvenile corrections), the Boston School Police, and gang outreach and prevention “street workers” attached to the Boston Community Centers program[2].
            The research of the Boston Gun Project yielded new insights about the youth homicide trend in the Boston area. These trends were the most prominent findings in the research:
·      The firearms used in these cases were most often relatively new semiautomatic pistols;
·      Small groups of “chronically offending gang-involved youth”, from 61 disorganized gangs with roughly 1300 members, were responsible for 60% of the youth homicides;
·      Chronic disputes between these gangs were the main cause of these youth homicides.[3]

As a result of these research findings on the Boston youth homicide rates, the Boston Gun Project’s working group created “Operation Ceasefire” in attempts to lower youth violence and homicide rates. The following were the goals of the program:
·       “Expanding the focus of local, state, and federal authorities to include intrastate trafficking in Massachusetts-sourced guns, in addition to interstate trafficking;
  • Focusing enforcement attention on traffickers of those makes and calibers of guns most used by gang members;
  • Focusing enforcement attention on traffickers of those guns showing short time-to-crime, and thus most likely to have been trafficked. The Boston Field Division of ATF set up an in-house tracking system that flagged guns whose traces showed an 18-month or shorter time-to-crime;
  • Focusing enforcement attention on traffickers of guns used by the city’s most violent gangs;
  • Attempting restoration of obliterated serial numbers, and subsequent trafficking investigations based on those restorations;
  • Supporting these enforcement priorities through analysis of crime gun traces generated by the Boston Police Department’s comprehensive tracing of crime guns, and by developing leads through systematic debriefing of, especially, arrestees involved with gangs and/or involved in violent crime.”[4]

 In addition, the working group would employ a “pulling levers” strategy, which would be a direct outreach to gangs, warning them that the violent behavior would not be tolerated, and that if violence occurred that they would “pull every lever” legally available[5]. Because so many various agencies were involved in the working group, this crackdown was truly a large threat. In conjunction, other social groups would offer services and other kinds of support for these chronic offenders. These messages were delivered both at a group level, and on an individual basis. Operation Ceasefire realized that these operations would not completely stop youth gang violence – the goal was to curb the number of homicides. Additionally, by making weapons harder to obtain they hoped to create a “firebreak” that would jumpstart a trend in reduced

            Although the preliminary steps for the creation of Operation Ceasefire began in 1995, the program was evaluated in the 1996-1997 year, when the first of the full Operation Ceasefire interventions occurred. In the media it was quickly hailed as a “great success” because the number of youth homicides dropped dramatically in mid-1996; however it was unclear whether this was due to the Boston Gun Project, or other factors.

Key Evaluation Questions Addressed
            The impact evaluation for Operation Ceasefire specifically noted that their impact evaluation focused on four key questions:
1.     Were there significant reductions in youth homicides and other indicators of non- fatal serious gun violence associated with the implementation of Operation Ceasefire in Boston?
2.     Did the timing of Boston’s significant reduction in youth homicide coincide with the implementation of Operation Ceasefire?
3.     Were other factors responsible for Boston’ s reduction in youth homicide?
4.     Was Boston’ s significant youth homicide reduction distinct relative to youth homicide trends in other major U.S. and New England cities?[7]

Summary of the Research Design and Data Collection Methods Used
This study enabled a basic one-group time series design, where the key outcome variable
was the number of homicide victims ages 24 and under. Data was collected each month. The study did not have a control group because it was not focusing on gang violence in particular – just youth violence as a whole. A control group would also be difficult to obtain because the intervention was a self-sustaining cycle, and the communications strategy was designed to encompass those who had not been convicted (prevention)[8]. Additionally, holding some groups constant would be against the values of the study, since that would be potentially putting more people in danger. These statistics between the years of 1991 and 1998 were obtained from the Boston Police Department’s Office of Research and Analysis to compare the pre-epidemic years to the epidemic, and then the post-intervention years. Another variable that was gathered and measured was the number “shots fired” citizen calls for service and citywide official gun assault incidents per month, to see if the program was successful in reducing gun violence. This data was only available from 1991 to 1997, and did not mention the age of the victim, both of which created shortcomings.
            In addition, the researchers employed a non-randomized quasi-experiment that would compare the youth homicide trends to those in other large cities throughout the United States. The results of this part of the study were not flushed out in complete detail.

Threats to the Evaluation
Several different threats to validity were produced in this evaluation, which would lead the reader to critically examine findings. Overall, the reliability of the findings was questionable. The design of the study did not help this issue because inherently time-series designs and quasi-experiments neither have high internal nor external validity. These designs also make the experiment susceptible to trends, seasonal variations, or random fluctuations[9]. The authors tried to compensate for this by running multiple regressions, which came up with the same results: 63% decrease in the monthly number of youth homicides in Boston, 32% decrease in the monthly number of citywide shots fired calls, 25% decrease in the monthly number of citywide gun assault incidents, and 44% decrease in the monthly number of District B-2 youth gun assault incidents[10]. The threats are discussed below broken out by type: measurement validity, internal validity, external validity, and statistical conclusion validity.

Measurement Validity
            Compared to other validity problems found in this study, the measurement validity errors are relatively minor. One such measurement validity issue is mono-operation bias, in which running the treatment and not examining the full range of implications complicates the inference[11]. In this study, it is made clear that the goal of Operation Ceasefire is to decrease the number of youth homicides in Boston. However, other results and implications of the program are not discussed. For example, Operation Ceasefire could also have helped with other crimes involving guns, other homicides that are directly related to youth gangs, giving alternative activities to keep youth off of the streets, gang prevention, and more that is not captured in this study.
            In addition, it would be very difficult to replicate the results of the study. Because it is very specific to time and location, setting up a perfect second trial would be next to impossible. For this reason it is more difficult to accept the findings.

Internal Validity
            Because there is no control group in this study, internal validity automatically becomes a large issue. Without a comparison group, it is very difficult to measure the true magnitude of the program’s effect.
            A major threat to internal validity in this study is the “history threat”, which is when other events occur at the same time as the treatment, which could alter the participant’s behavior, thus creating an alternative explanation for the results of the treatment[12]. This study recognizes this threat, by recognizing that several other programs occurred at the same time as Operation Ceasefire. The report outlined that public health initiatives, Operation Night Light, Boston’s Ten Point Coalition, and fire arm anti-trafficking initiatives all could have caused or meaningfully influenced the results of this study[13]. In attempt to account for this issue, the study looked at the time series data around the dates that these different initiatives began and were fully implemented, and found that there was no significant change in the number of youth homicides. Additionally, for the anti-trafficking initiatives, the researchers ran regressions holding some variables constant, but openly admitted that the model was far from ideal.
            Another prominent internal validity threat in this study is “maturation”, which is that participants change over time naturally, regardless of the treatment[14]. In this study, this could be misinterpreted as a change in behavior due to Operation Ceasefire during the experiment when in fact the youth naturally mature and choose not to engage in such activity. The authors did not recognize this as a threat to their experiment. The researchers emphasize that this is not an issue since there was a significant drop in the number of youth homicides in June 1996, which is the month that Operation Ceasefire was first in full swing.
            A final threat to internal validity in this study is a regression to the mean. During this time period Boston was experiencing an unusually high crime rate with the youth homicides – normally numbers were not that outrageously high. This threat is especially relevant in this case, because Operation Ceasefire is occurring at or near the crisis point. Because the evaluation only looks at the year when Operation Ceasefire was first fully implemented, we do not know if the reduction actually started before or after the commencement of the program. Even though there was a model run from month-to-month to see when there was a significant decrease, if at all, there is no extensive analysis preceding the attempts of Operation Ceasefire. Therefore the drops in crime could be partly attributed to the regression to the mean, although it is impossible to know for sure or to measure.

External Validity
            Due to the nature of the study, overall Operation Ceasefire has poor external validity, or generalizability. First and foremost, the researchers recognize that geographic effects exist, in that the treatment is extremely specific to the trends represented in Boston[15]. Because there are clearly interactions of the causal relationship with the setting, it is difficult to obtain the same results from doing this exact program in groups or contexts beyond this study. In order to try to improve generalizability, the project used the non-random quasi experiment to look at other cities that utilized the “pulling levers” deterrence strategy in trying to influence the behavior and environment of the targeted youth at the core of the city’s violence problem. They noted that there were “encouraging preliminary results” of similar programs in a number of cities across the country, which have the same general framework, but different distinguishing characteristics[16].
            Additionally, as previously mentioned, there is the problem of the multiple treatment interference effect. Operation Ceasefire was far from the only program occurring in Boston during this time period in order to deter youth gang violence, so it is almost impossible to tell which program, or combination of programs, contributed to the effects. Because of this contamination, it is hard to generalize these results past this particular study.

Statistical Conclusion Validity
Unfortunately, statistical conclusion validity has the potential to be the biggest issue in this study due to measurement problems.  Mainly, in this study this problem is evident due to the limited range of variables of interest. The authors of the study recognize this problem, and specifically note that the computerized incident data from the Boston Police Department is limiting. Because they are taking data from existing databases, the study did not have any control over the data collected in the incident reports. For example, the data for monthly counts of citywide “shots fired” citizen calls for service data and citywide official gun assault incident report data could not be collected for as long of a time span as the other data set because the police department had lags in data collection and preparation procedures. On top of that, none of these records capture the age of the victim, which is crucial since we are looking at youth in particular. In order to compensate for this lack of data, the researchers had to cross check hard copies of gun assault incident reports during the given time period, and pull the ages of the victims from those documents. Because the coding and collection of this information was so time consuming, the data pool got further narrowed and only one district, B-2, was analyzed. While the authors assured us that this was an acceptable sample because it is a district with high police activity, home to 29 of the 61 youth gangs, it would have been possible to analyze the entire city of the databases were complete. This in itself could be identified as a selection threat to validity, because they believed this district provided the best chance of seeing the hypothesized effect[17].

Suggestions on how the evaluation could be improved.
            Unfortunately, the impact of Operation Ceasefire is extremely difficult to measure, and because there is no way to turn it into a randomized control trial it is impossible to make it perfectly valid. However, there are a few ways that this evaluation could be altered in order to improve the findings.
            First, the lack of control is a major problem that could be conceivably solved. For the latter half of the experiment, data was collected from only one district within Boston. If there are clear district lines, it would be insightful to make one district a control and a different, but similar district a treatment group, in order to make a more direct comparison. This would make it so the program can take more credit for decreasing the crime rates in that district, as opposed to outside factors. If Operation Ceasefire proves to make an impact in the original district, than it could feasibly continue and begin work in other districts. This would eliminate the fear of the researchers of not being fair, and making sure all of the youth who would benefit from Operation Ceasefire would have the opportunity to participate while still giving the evaluation more weight.
            Additionally, including a lag period in results would be helpful in justifying the impact of the program. It seems hard to believe that the program would have such a strong impact immediately – more likely the effects would take at least a little bit of time to kick in. Even if the lag was measured in weeks as opposed to measured in months, this would bring more credibility to Operation Ceasefire, and make it look like less of a coincidence or regression to the mean.
            Furthermore, no discussion occurred about the ongoing effects of this program. If Operation Ceasefire were to be considered a model for other cities to duplicate, it would be helpful to know the intended duration and long-term effects. It was clear that the program yielded lots of success initially, but is that effect expected to continue? After the initial drop, would the programs continue? Would they cease after homicides and gang violence were curbed? Would the program have the same distribution of social and law enforcement effort, or would it shift? These would all be valuable pieces of information.
            Finally, cost is never considered in this evaluation. This is an important aspect for public programs. It could be assumed that the groups involved are donating their time, but there could be overhead costs incurred on the community side by renting space and doing activities with the youth, in addition to the criminal enforcement side by increasing the sentencing for these offenses, and thus the need for more courts.

Policy Recommendations
            Due to the lack of external validity, it is hard to draw any obvious recommendations from this study. Because it is very specific to the city of Boston at a certain time period, it is hard to make bigger policy judgments without further investigation. The only conclusion that can be made with this study alone is that programs of this nature are best left to be administered and created at the local government level, because they need to be so highly specific to region. While they can receive funding from state and federal levels, it would not make sense to make an overriding policy that all states or cities must follow.
            However, other cities followed in the footsteps of the Boston Gun Project in the “pulling levers” approach, which helped add more evidence on creating effective programs to target youth violence across the United States. Unfortunately, these results were only briefly shared in this report, without the supporting data to draw further conclusions. For the purposes of this critique, an evaluation of the Operation Ceasefire in Los Angeles, one of the cities mentioned with “encouraging preliminary results,” was located. In the RAND Public Safety and Justice and the Homeland Security Center within the RAND National Security Research Division’s 2003 study, researchers wished to see if the same approach used in Operation Ceasefire in Boston would be successful in a city with larger gang problems. The Los Angeles study took three main components from the Boston Gun Project’s model:
·      Creation of a “working group” of community leaders, criminal justice professionals, clergy, and researchers to create and implement the intervention strategies;
·      An approach consisting of social services and tough punitive measures; and
·      Making the program dynamic, constantly shifting the balance of social services and punitive measures based on the current need[18].

With the help of the National Institute of Justice, the working group began testing this program in East Los Angeles neighborhoods. RAND expected the same core values from Boston to hold true to the Los Angeles case, but the intervention approaches to differ. This is because the social programs involved in the working group were more decentralized in nature, and more neighborhood specific than in Boston. In the end, they did not boast as much success as Boston’s Operation Ceasefire. While the coordinated community effect proved to help reduce juvenile violence and the working group successfully utilized data to design effective intervention strategies, no city or agency took ownership of the project, no opportunity arose to keep the program ongoing in the long term, and no financial support was secured.
The RAND report stated that the following would be necessary for the ongoing success of future projects. These insights provided more specific recommendations for future programs than the evaluation from Boston:
·      Crime data should be analyzed carefully when designing interventions;
·      Social services should balance law enforcement efforts to the extent possible;
·      City leaders should provide concrete forms of support to the municipal agencies involved;
·      City leaders should hold the municipal agencies involved accountable for the results;
·      Cost data should be collected to determine whether the projects merit continuation[19].
Overall, while there are many questionable aspects from the evaluation, it seems clear that some aspects of the Boston Gun Project’s Operation Ceasefire were successful, and worth repeating in other locations. The results of the Los Angeles study further proves that any Operation Ceasefire program must be altered at the local level in order to obtain the best results. While Operation Ceasefire can create a general policy framework for other cities, there needs to be room to take individual city characteristics into account in the program’s design and execution.

[1] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 2 of
[2] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 3 of
[3] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 3-4 of
[4] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 5 of
[5] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 5 of
[6] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 7 of
[7] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 8 of
[8] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 9 of
[9] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 11 of
[10] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 11-12 of
[11] Newcomer, Kathy. (2011). “Strategies to Help Strengthen Validity and Reliability of Data.” Page 5.
[12] Newcomer, Kathy. (2011). “Strategies to Help Strengthen Validity and Reliability of Data.” Page 10.
[13] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 14 of
[14] Newcomer, Kathy. (2011). “Strategies to Help Strengthen Validity and Reliability of Data.” Page 10.
[15] Newcomer, Kathy. (2011). “Strategies to Help Strengthen Validity and Reliability of Data.” Page 18.
[16] U.S. Department of Justice. (2000). The Boston Gun Project: Impact Evaluation. (NIJ publication No. #94-IJ-CX-0056). Boston, MA: Anthony Braga et el. Retrieved from page 19 of
[17] Newcomer, Kathy. (2011). “Strategies to Help Strengthen Validity and Reliability of Data.” Page 16.
[18] Tita, George et al. (2003). Unruly Turf: The Role of Interagency Collaborations in Reducing Gun Violence. RAND Review, Fall 2003.
[19] Tita, George et al. (2003). Unruly Turf: The Role of Interagency Collaborations in Reducing Gun Violence. RAND Review, Fall 2003.

No comments:

Post a Comment