Chapter 1 Synthetic Control

This is just a generalization of the difference-in-differences design and one of the most important evaluation designs in recent years (Athey and Imben, 2017). We use a quantitative comparative case study by focusing on a particular single unit, such as a state, school, county, group, etc. We optimize a set of weights from a donor pool of comparison states to generate a counterfactual for the treatment state.

The Premise: Abadie and Gardeazabal (2003) use a method of a weighted average of units from a donor pool to model a counterfactual (synthetic control)

1.1 The estimator

Our synthetic contorl method is the following:

\[ \hat{\delta} = Y_{1t}-\sum^{J+1}_{j=2}w^*_jY_{jt} \]

Where

  • \(J=1\) is our treatment group
  • \(J+1\) is our donor pool of comparison
  • \(Y_1t\) is the outcome of the treatment group
  • \(Y_{jt}\) is the outcome from donor \(j\)
  • \(w^*_j\) is a vector of optimized weights that is a function of our choice of covariates

The synthetic control estimator estimates the effect of the program at time \(t \geq T_{0}\).

It’s just the difference between our treatment group and a weighted control group at time \(T_{0}\). Where our optimized-weighted control group will depend on our set of covariates we choose

Matching variables: Matching variables \(X_1\) and \(X_0\) are chosen as predictors of post-intervention outcomes and must be unaffected by the program, treatment, or intervention.

1.2 synth package

First, we will install the synth package.

ssc install synth 

Next, we will install the mat2txt package.

ssc install mat2txt

1.3 Prison Construction

We will recreate Cunningham and Kang (2019) work on the effect of building prisons and black male incarceration rates. We will import and inspect our data after installing our packages.

Cunningham (2021) provides the change in prison in Texas and shows that 1993 begins the prison construction boom.

Construction of prisons in Texas
Construction of prisons in Texas

Next, let’s bring the data and summarize the black male incarceration rate per 100,000.

cd "/Users/Sam/Desktop/Econ 672/Data"
use texas, clear

sum bmprison, detail
/Users/Sam/Desktop/Econ 672/Data

                          BM Prison
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%           24              0
10%           47              0       Obs                 816
25%          489              0       Sum of wgt.         816

50%       3055.5                      Mean           7625.753
                        Largest       Std. dev.      10088.11
75%      11425.5          58393
90%        21153          59709       Variance       1.02e+08
95%        27204          60785       Skewness       2.080135
99%        46235          61861       Kurtosis       8.401005

We can see that the mean black male incarceration rate is 3,055 per 100,000.

Next we need to set our idcount and check for redundant values. This is a good practice when working with panel data.

sort statefip year
by statefip year: gen idcount = _N
tab idcount

*No redundent or repeated values
drop idcount
    idcount |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        816      100.00      100.00
------------+-----------------------------------
      Total |        816      100.00

Now, we can set the panel with statefips being our unit of analysis and year being our time dimension

xtset statefip year
Panel variable: statefip (strongly balanced)
 Time variable: year, 1985 to 2000
         Delta: 1 unit

We have a balanced panel to begin our synthetic control. This step is not necessary for the analysis, but I recommend establishing your panel before further analysis.

1.4 Estimate the Weighted Treatment Effect

We will next calculate the gaps, or the weighted average treatment effect on the treated compared to a synthetic comparison.

We will use our outcome, but also lags in our outcomes, covariates, and lags in our covariates. We will specify that Texas, \(FIPS=48\), is our treatment unit and the time of treatment is 1993. We specify the pre-intervention period from 1985 to 1993 and increment by 1 year with mseperiod. This is the period where we want to minimize the Mean Squared Predicted Error or \(MSPE\).

Cunningham provides the covariates used to estimate synthetic Texas

  • Lagged pre-treatment rates in 1988, 1990, 1991, and 1992
  • Unemployment rate, income, and poverty
  • Lagged AIDS per capita in 1990, 1991
  • Lagged Black population % in 1990,1991, and 1992
  • Percentage of 15-24 year olds in 1990
synth   bmprison bmprison(1988) bmprison(1990) bmprison(1991) bmprison(1992) alcohol(1990)
aidscapita(1990) aidscapita(1991) income ur poverty black(1990) black(1991) black(1992)
perc1519(1990), trunit(48) trperiod(1993) unitnames(state) mspeperiod(1985(1)1993)
resultsperiod(1985(1)2000) keep(synth_bmprate.dta) replace
Synthetic Control Method for Comparative Case Studies
----------------------------------------------------------------------------------

First Step: Data Setup
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Data Setup successful
----------------------------------------------------------------------------------
                Treated Unit: Texas
               Control Units: Alabama, Alaska, Arizona, Arkansas, California,
                              Colorado, Connecticut, Delaware, District of
                              Columbia, Florida, Georgia, Hawaii, Idaho, Illinois,
                              Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine,
                              Maryland, Massachusetts, Michigan, Minnesota,
                              Mississippi, Missouri, Montana, Nebraska, Nevada,
                              New Hampshire, New Jersey, New Mexico, New York,
                              North Carolina, North Dakota, Ohio, Oklahoma,
                              Oregon, Pennsylvania, Rhode Island, South Carolina,
                              South Dakota, Tennessee, Utah, Vermont, Virginia,
                              Washington, West Virginia, Wisconsin, Wyoming
----------------------------------------------------------------------------------
          Dependent Variable: bmprison
  MSPE minimized for periods: 1985 1986 1987 1988 1989 1990 1991 1992 1993
Results obtained for periods: 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
                              1995 1996 1997 1998 1999 2000
----------------------------------------------------------------------------------
                  Predictors: bmprison(1988) bmprison(1990) bmprison(1991)
                              bmprison(1992) alcohol(1990) aidscapita(1990)
                              aidscapita(1991) income ur poverty black(1990)
                              black(1991) black(1992) perc1519(1990)
----------------------------------------------------------------------------------
Unless period is specified
predictors are averaged over: 1985 1986 1987 1988 1989 1990 1991 1992
----------------------------------------------------------------------------------

Second Step: Run Optimization
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Optimization done
----------------------------------------------------------------------------------

Third Step: Obtain Results
----------------------------------------------------------------------------------
Loss: Root Mean Squared Prediction Error

---------------------
   RMSPE |  1295.489 
---------------------
----------------------------------------------------------------------------------
Unit Weights:

----------------------------------
               Co_No | Unit_Weight
---------------------+------------
             Alabama |           0
              Alaska |           0
             Arizona |           0
            Arkansas |           0
          California |        .408
            Colorado |           0
         Connecticut |           0
            Delaware |           0
District of Columbia |           0
             Florida |        .109
             Georgia |           0
              Hawaii |           0
               Idaho |           0
            Illinois |         .36
             Indiana |           0
                Iowa |           0
              Kansas |           0
            Kentucky |           0
           Louisiana |        .122
               Maine |           0
            Maryland |           0
       Massachusetts |           0
            Michigan |           0
           Minnesota |           0
         Mississippi |           0
            Missouri |           0
             Montana |           0
            Nebraska |           0
              Nevada |           0
       New Hampshire |           0
          New Jersey |           0
          New Mexico |           0
            New York |           0
      North Carolina |           0
        North Dakota |           0
                Ohio |           0
            Oklahoma |           0
              Oregon |           0
        Pennsylvania |           0
        Rhode Island |           0
      South Carolina |           0
        South Dakota |           0
           Tennessee |           0
                Utah |           0
             Vermont |           0
            Virginia |           0
          Washington |           0
       West Virginia |           0
           Wisconsin |           0
             Wyoming |           0
----------------------------------
----------------------------------------------------------------------------------
Predictor Balance:

------------------------------------------------------
                               |   Treated  Synthetic 
-------------------------------+----------------------
                bmprison(1988) |     16956   18374.28 
                bmprison(1990) |     22634    23134.8 
                bmprison(1991) |     23249   24272.24 
                bmprison(1992) |     27568   25776.34 
                 alcohol(1990) |      2.47    2.70996 
              aidscapita(1990) |  13.58992   14.99957 
              aidscapita(1991) |  14.83504   17.24535 
                        income |  16108.63   18614.54 
                            ur |  7.426042   7.188876 
                       poverty |      17.2   14.86484 
                   black(1990) |  16.15459   15.98038 
                   black(1991) |  16.29985   16.16835 
                   black(1992) |  16.45955   16.37467 
                perc1519(1990) |  7.703001   7.022702 
------------------------------------------------------
----------------------------------------------------------------------------------

counter | pri_inf  | dual_inf  | pri_obj   | dual_obj  | sigfig | alpha  | nu 
----------------------------------------------------------------------------------
      0 | 9.51e+01 | 1.57e-03 | -1.82e+00 | -5.02e+02 |  0.000 | 0.0000 | 1.00e+02
      1 | 5.46e-01 | 8.99e-06 | -1.77e+00 | -9.08e+02 |  0.000 | 0.9943 | 2.73e-05
      2 | 1.14e-02 | 1.88e-07 | -1.41e+00 | -3.35e+01 |  0.000 | 0.9791 | 3.67e-05
      3 | 1.21e-03 | 1.99e-08 | -1.45e+00 | -4.88e+00 |  0.000 | 0.8943 | 3.09e-05
      4 | 2.39e-04 | 3.94e-09 | -1.77e+00 | -2.65e+00 |  0.497 | 0.8022 | 1.15e-05
      5 | 1.14e-05 | 1.87e-10 | -1.81e+00 | -1.87e+00 |  1.673 | 0.9525 | 1.67e-07
      6 | 4.95e-06 | 8.16e-11 | -1.81e+00 | -1.84e+00 |  2.042 | 0.5645 | 1.01e-06
      7 | 1.77e-06 | 2.91e-11 | -1.82e+00 | -1.83e+00 |  2.490 | 0.6428 | 2.88e-07
      8 | 7.32e-07 | 1.20e-11 | -1.82e+00 | -1.82e+00 |  2.865 | 0.5864 | 1.39e-07
      9 | 3.15e-07 | 5.19e-12 | -1.82e+00 | -1.82e+00 |  3.238 | 0.5696 | 6.39e-08
     10 | 9.21e-09 | 1.52e-13 | -1.82e+00 | -1.82e+00 |  4.622 | 0.9708 | 1.16e-10
     11 | 4.60e-11 | 7.66e-16 | -1.82e+00 | -1.82e+00 |  6.907 | 0.9950 | 1.39e-13
     12 | 2.30e-13 | 3.47e-17 | -1.82e+00 | -1.82e+00 |  9.204 | 0.9950 | 7.23e-16
     13 | 1.31e-15 | 2.05e-17 | -1.82e+00 | -1.82e+00 | 11.502 | 0.9950 | 3.64e-18
     14 | 6.19e-16 | 3.65e-17 | -1.82e+00 | -1.82e+00 | 13.792 | 0.9950 | 1.84e-20
----------------------------------------------------------------------------------
optimization converged

We have a few interesting results.

First, our \(RMSPE=1295.489\). We can and should continue to test the combination of covariates that reduces the \(RMSPE\) in the pre-treatment period. Recall that Abadie, Diamond, and Hainmueller suggest choosing a \(V\) that minimizes the mean squared prediction error in the pre-treatment period. Recall that we can influence the \(w\) through \(V\) since \(V\) is a function of our covariates.

\[ RMSPE=\sqrt{\sum^{T_0}_{t=1} \left( Y_{1t}-\sum^{J+1}_{j=2}w^{*}_{j}(V)Y_{jt}\right)^2} \]

Second, our results show that California is the largest donor with a weight of 40.8%. Illinois is next with a weight of 36%, while Louisiana and Florida are the thrird and fourth largest donors at 12.2% and 10.9%, respectively.

Next we have our predictor balance between Texas and Synthetic Texas.

Balance between Texas and Synthetic Texas
Balance between Texas and Synthetic Texas

These looked balanced, but what are we missing? Ways to test the null hypothesis that these are not different!

1.5 Plot the Graph

We can add the option figure to the synth command to plot the Texas vs Synthetic Texas.

synth   bmprison bmprison(1988) bmprison(1990) bmprison(1991) bmprison(1992) alcohol(1990) aidscapita(1990) aidscapita(1991) income ur poverty black(1990) black(1991) black(1992) perc1519(1990), trunit(48) trperiod(1993) unitnames(state) mspeperiod(1985(1)1993) resultsperiod(1985(1)2000) keep(synth_bmprate.dta) replace fig
Synthetic Texas vs Texas
Synthetic Texas vs Texas

1.6 Plot the Gap or Difference

We are going to pull our results from the synthetic control method with the file synth_bmprate.dta, which we defined in the keep option.

use synth_bmprate.dta, clear

The first two columns are our donor pool and their weights. We don’t need these now so we can drop them. The 3rd and 4th columns are our outcomes of interest: \(Y\) of Texas and \(Y\) of Synthetic Texas. The 5 column is our time period

keep _Y_treated _Y_synthetic _time
drop if _time==.
rename _time year
rename _Y_treated  treat
rename _Y_synthetic counterfact

Generate the difference/gap between Texas outcome and Synthetic Texas outcome.

gen gap48=treat-counterfact
sort year

Finally, we will plot the difference between two outcomes

twoway (line gap48 year,lp(solid)lw(vthin)lcolor(black)), yline(0, lpattern(shortdash) lcolor(black)) ///
    xline(1993, lpattern(shortdash) lcolor(black)) xtitle("Year",si(medsmall)) xlabel(#10) ///
    ytitle("Gap in black male prisoners", size(medsmall)) legend(off)

save synth_bmprate_48.dta, replace
Difference between Texas and Synthetic Texas
Difference between Texas and Synthetic Texas

We can see an difference about 25,000 black males incarcerated per capita in 1995. However, we need to test the null hypothesis to see if this is a statistically significant effect.