Chapter 2 Experimental Research Design

We can estimate the $ATE$ using experimental research designs. If the experiments were well-designed and well-implemented experiments, then we can use the ttest command to calculate the simple difference in outcomes or $SDO$

2.1 Job Corps (2001) Evaluation

The Deparment of Labor wanted an evaluation of the Job Corps program. The Job Corps program serves disadvantaged youth to improve employment outcomes. Individuals were randomly assigned to Job Corps or out of Job Corps due to limited slots.

2.1.1 Inspect the Data

Let’s inspect our outcome variable and treatment variable

cd "/Users/Sam/Desktop/Econ 672/Data"
use JC, clear

describe earny4
describe assignment

sum earny4
tab assignment

/Users/Sam/Desktop/Econ 672/Data

(Written by R.              )

              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
earny4          double  %9.0g                 Earnings 4 years after assignment

              storage   display    value
variable name   type    format     label      variable label
----------------------------------------------------------------------------------------------------------------------
assignment      double  %34.0g     assignment1
                                              Random assignment to Job Corps

              Earnings 4 years after assignment
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%            0              0
10%            0              0       Obs               9,240
25%     43.50582              0       Sum of Wgt.       9,240

50%     181.7103                      Mean           207.6163
                        Largest       Std. Dev.      194.5507
75%     307.3206       1821.865
90%     442.1254       1859.233       Variance       37849.99
95%     544.7895       2357.738       Skewness       1.890325
99%     835.3426       2409.909       Kurtosis       12.32882

    Random assignment to Job Corps |      Freq.     Percent        Cum.
-----------------------------------+-----------------------------------
Randomly assigned out of Job Corps |      3,663       39.64       39.64
    Randomly assigned to Job Corps |      5,577       60.36      100.00
-----------------------------------+-----------------------------------
                             Total |      9,240      100.00

Our outcome variable shows that the median earnings is $181.71, while the mean is $207.62.

Our treatment variable has 5,577 assigned to treatment and 3,663 assigned to the control group.

2.1.2 Estimate the impact

We can estimate the simple differnce in outcomes with the ttest command.

ttest earny4, by(assignment)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Randomly |   3,663    197.9258    3.072182    185.9368    191.9025    203.9492
Randomly |   5,577     213.981    2.675004    199.7675    208.7369     219.225
---------+--------------------------------------------------------------------
combined |   9,240    207.6163    2.023936    194.5507    203.6489    211.5836
---------+--------------------------------------------------------------------
    diff |           -16.05513    4.134466               -24.15959   -7.950661
------------------------------------------------------------------------------
    diff = mean(Randomly) - mean(Randomly)                        t =  -3.8832
Ho: diff = 0                                     degrees of freedom =     9238

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0001         Pr(|T| > |t|) = 0.0001          Pr(T > t) = 0.9999

Next, let’s run an OLS regression with and without robust standard errors

reg earny4 i.assignment
reg earny4 i.assignment, robust

      Source |       SS           df       MS      Number of obs   =     9,240
-------------+----------------------------------   F(1, 9238)      =     15.08
       Model |  569892.708         1  569892.708   Prob > F        =    0.0001
    Residual |   349126136     9,238  37792.3941   R-squared       =    0.0016
-------------+----------------------------------   Adj R-squared   =    0.0015
       Total |   349696029     9,239  37849.9869   Root MSE        =     194.4

-------------------------------------------------------------------------------------------------
                         earny4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------------------+----------------------------------------------------------------
                     assignment |
Randomly assigned to Job Corps  |   16.05513   4.134466     3.88   0.000     7.950661    24.15959
                          _cons |   197.9258   3.212061    61.62   0.000     191.6295    204.2222
-------------------------------------------------------------------------------------------------


Linear regression                               Number of obs     =      9,240
                                                F(1, 9238)        =      15.53
                                                Prob > F          =     0.0001
                                                R-squared         =     0.0016
                                                Root MSE          =      194.4

-------------------------------------------------------------------------------------------------
                                |               Robust
                         earny4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------------------+----------------------------------------------------------------
                     assignment |
Randomly assigned to Job Corps  |   16.05513   4.073534     3.94   0.000     8.070101    24.04015
                          _cons |   197.9258   3.072095    64.43   0.000     191.9039    203.9478
-------------------------------------------------------------------------------------------------

We see simple difference in outcomes is about 16 dollars higher, which is what we see in the Schochet, P. Z., Burghardt, J., Glazerman, S. (2001): “National Job Corps study: The impacts of Job Corps on participants’ employment and related outcomes”, Mathematica Policy Research, Washington, DC.

2.1.3 Covariate Balance Test and Specification Tests

Next, let’s test the is there are any covariates associated with the treatment after randomization


ttest age, by(assignment)
ttest educ, by(assignment)
ttest educmum, by(assignment)
ttest educdad, by(assignment)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Randomly |   3,663    18.33825     .034554    2.091302     18.2705    18.40599
Randomly |   5,577    18.50099    .0291468     2.17666    18.44385    18.55813
---------+--------------------------------------------------------------------
combined |   9,240    18.43647    .0223105    2.144592    18.39274    18.48021
---------+--------------------------------------------------------------------
    diff |           -.1627389    .0455812               -.2520881   -.0733896
------------------------------------------------------------------------------
    diff = mean(Randomly) - mean(Randomly)                        t =  -3.5703
Ho: diff = 0                                     degrees of freedom =     9238

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0002         Pr(|T| > |t|) = 0.0004          Pr(T > t) = 0.9998


Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Randomly |   3,663    9.924379    .0327304    1.980934    9.860207    9.988551
Randomly |   5,577    9.977228    .0270215    2.017948    9.924255     10.0302
---------+--------------------------------------------------------------------
combined |   9,240    9.956277    .0208418    2.003416    9.915423    9.997132
---------+--------------------------------------------------------------------
    diff |            -.052849    .0426065               -.1363671    .0306691
------------------------------------------------------------------------------
    diff = mean(Randomly) - mean(Randomly)                        t =  -1.2404
Ho: diff = 0                                     degrees of freedom =     9238

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.1074         Pr(|T| > |t|) = 0.2149          Pr(T > t) = 0.8926


Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Randomly |   3,663    9.280917     .084987    5.143648    9.114291    9.447544
Randomly |   5,577    9.335664    .0675206    5.042389    9.203298    9.468031
---------+--------------------------------------------------------------------
combined |   9,240    9.313961    .0528746    5.082565    9.210315    9.417607
---------+--------------------------------------------------------------------
    diff |           -.0547471     .108098                -.266643    .1571489
------------------------------------------------------------------------------
    diff = mean(Randomly) - mean(Randomly)                        t =  -0.5065
Ho: diff = 0                                     degrees of freedom =     9238

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.3063         Pr(|T| > |t|) = 0.6125          Pr(T > t) = 0.6937


Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
Randomly |   3,663    7.019383    .1004599    6.080105     6.82242    7.216346
Randomly |   5,577     7.04662    .0803195    5.998205    6.889163    7.204078
---------+--------------------------------------------------------------------
combined |   9,240    7.035823     .062736    6.030492    6.912846    7.158799
---------+--------------------------------------------------------------------
    diff |            -.027237    .1282603               -.2786556    .2241816
------------------------------------------------------------------------------
    diff = mean(Randomly) - mean(Randomly)                        t =  -0.2124
Ho: diff = 0                                     degrees of freedom =     9238

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.4159         Pr(|T| > |t|) = 0.8318          Pr(T > t) = 0.5841

We fail to reject the null hypothesis that there are differences between treatment and control among the covariates of age, education attained, mother’s education, and father’s education.

tab black assignment, chi2
tab female assignment, chi2
tab hispanic assignment, chi2
tab cohabmarried assignment, chi2
tab haschild assignment, chi2

Individual |
  is Black | Random assignment to
or African |       Job Corps
  American | Randomly   Randomly  |     Total
-----------+----------------------+----------
 Non-Black |     1,873      2,810 |     4,683 
     Black |     1,790      2,767 |     4,557 
-----------+----------------------+----------
     Total |     3,663      5,577 |     9,240 

          Pearson chi2(1) =   0.4941   Pr = 0.482

           | Random assignment to
Individual |       Job Corps
 is female | Randomly   Randomly  |     Total
-----------+----------------------+----------
      Male |     2,220      2,960 |     5,180 
    Female |     1,443      2,617 |     4,060 
-----------+----------------------+----------
     Total |     3,663      5,577 |     9,240 

          Pearson chi2(1) =  50.9039   Pr = 0.000

                | Random assignment to
  Individual is |       Job Corps
Latino/Hispanic | Randomly   Randomly  |     Total
----------------+----------------------+----------
     Non-Latino |     3,024      4,641 |     7,665 
Latino/Hispanic |       639        936 |     1,575 
----------------+----------------------+----------
          Total |     3,663      5,577 |     9,240 

          Pearson chi2(1) =   0.6842   Pr = 0.408


Individual is Married | Random assignment to
     or Cohabiting at |       Job Corps
           assignment | Randomly   Randomly  |     Total
----------------------+----------------------+----------
Not Married or cohabi |     3,441      5,233 |     8,674 
Married or Cohabiting |       222        344 |       566 
----------------------+----------------------+----------
                Total |     3,663      5,577 |     9,240 

          Pearson chi2(1) =   0.0445   Pr = 0.833


 Individual has child | Random assignment to
       or children at |       Job Corps
           assignment | Randomly   Randomly  |     Total
----------------------+----------------------+----------
           No chilren |     2,981      4,417 |     7,398 
Has at least one chil |       682      1,160 |     1,842 
----------------------+----------------------+----------
                Total |     3,663      5,577 |     9,240 

          Pearson chi2(1) =   6.5895   Pr = 0.010


  Individual |
    has ever | Random assignment to
   worked at |       Job Corps
  assignment | Randomly   Randomly  |     Total
-------------+----------------------+----------
Never worked |     3,155      4,767 |     7,922 
  Has worked |       508        810 |     1,318 
-------------+----------------------+----------
       Total |     3,663      5,577 |     9,240 

          Pearson chi2(1) =   0.7768   Pr = 0.378

We see that we fail to reject the null hypothesis that there is a difference between treatment and control for individuals for Black, Latino/Hispanic, Married/Cohabiting, and Ever Worked. However, we reject the null hypothesis that there is no difference between treatment and control for being female or having any children.

Should we be concerned? Yes!

Let’s do a sensitivity tests. The first model will be the simple difference, while the second model will include the covariates we rejected the null hypothesis. The third model will include all covariates.

While there is some concern for a lack of covariate balance, the full data set utilizes weights which we do not have in the public use data.

*What happens to the standard errors

est clear
eststo m1: quietly reg earny4 i.assignment
eststo m2: quietly reg earny4 i.assignment i.female i.haschild
eststo m3: quietly reg earny4 i.assignment i.female i.black i.hispanic i.cohabmarried i.haschild i.everwkd age educ educmum educdad

esttab m1 m2 m3, mtitle("Model 1" "Model 2" "Model 3") keep(1.assignment)

                      (1)             (2)             (3)   
                  Model 1         Model 2         Model 3   
------------------------------------------------------------
1.assignment        16.06***        21.07***        19.59***
                   (3.88)          (5.16)          (4.96)   
------------------------------------------------------------
N                    9240            9240            9240   
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

We see that our ATE increase estimates range from 16 dollars to 21 dollars.

2.2 Leaftlet Information Evaluation

Next we will look at an impact evaluation of an information leaflet about coffee production on students’ awareness. The target population is Bulgarian high school and high school students and the outcome is the awareness of environmental issues due to coffee production.

2.2.1 Inspect the Data

Let’s inspect the data by running some descriptive statistics. Our dependent variable is awarewaste, while treatment is a leaflet treatment is randomly assigned to students.

cd "/Users/Sam/Desktop/Econ 672/Data"
use CoffeeLeaflet, clear

tab awarewaste
tab treatment

/Users/Sam/Desktop/Econ 672/Data

(Written by R.              )








Awareness of waste |
 production due to |
 coffee production |      Freq.     Percent        Cum.
-------------------+-----------------------------------
         Not Aware |        171       33.93       33.93
Slightly Not Aware |        157       31.15       65.08
           Neither |        121       24.01       89.09
    Slightly Aware |         41        8.13       97.22
       Fully Aware |         14        2.78      100.00
-------------------+-----------------------------------
             Total |        504      100.00

       Treatment Status |      Freq.     Percent        Cum.
------------------------+-----------------------------------
          Control Group |        261       50.00       50.00
Leaflet Treatment Group |        261       50.00      100.00
------------------------+-----------------------------------
                  Total |        522      100.00

2.2.2 Estimate the impact

Let’s run the simple difference in outcomes with ttest command.

ttest awarewaste, by(treatment)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
 Control |     254    2.007874    .0628675    1.001943    1.884064    2.131684
 Leaflet |     250       2.288    .0702564    1.110852    2.149627    2.426373
---------+--------------------------------------------------------------------
combined |     504    2.146825    .0474646    1.065578    2.053572    2.240079
---------+--------------------------------------------------------------------
    diff |            -.280126    .0942007               -.4652021   -.0950498
------------------------------------------------------------------------------
    diff = mean(Control) - mean(Leaflet)                          t =  -2.9737
Ho: diff = 0                                     degrees of freedom =      502

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0015         Pr(|T| > |t|) = 0.0031          Pr(T > t) = 0.9985

We reject the null hypothesis that the leaflet treatment had no impact, and it increases awareness by 0.28.

Let’s run an OLS regression.

reg awarewaste i.treatment

      Source |       SS           df       MS      Number of obs   =       504
-------------+----------------------------------   F(1, 502)       =      8.84
       Model |  9.88666867         1  9.88666867   Prob > F        =    0.0031
    Residual |  561.248252       502  1.11802441   R-squared       =    0.0173
-------------+----------------------------------   Adj R-squared   =    0.0154
       Total |  571.134921       503   1.1354571   Root MSE        =    1.0574

------------------------------------------------------------------------------------------
              awarewaste |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
               treatment |
Leaflet Treatment Group  |    .280126   .0942007     2.97   0.003     .0950498    .4652021
                   _cons |   2.007874   .0663451    30.26   0.000     1.877526    2.138222
------------------------------------------------------------------------------------------

Wait a second! What does 0.28 mean? What is a potential problem here?

Look at the outcome variable again. It is an ordinal variable! Let’s use a $\chi^2$ test

tab awarewaste treatment, chi2

Awareness of waste |
 production due to |   Treatment Status
 coffee production | Control G  Leaflet T |     Total
-------------------+----------------------+----------
         Not Aware |        97         74 |       171 
Slightly Not Aware |        82         75 |       157 
           Neither |        56         65 |       121 
    Slightly Aware |        14         27 |        41 
       Fully Aware |         5          9 |        14 
-------------------+----------------------+----------
             Total |       254        250 |       504 

          Pearson chi2(4) =   9.3087   Pr = 0.054

Let’s run an ordinal logit regression.

ologit awarewaste i.treatment, or

Iteration 0:   log likelihood = -693.62947  
Iteration 1:   log likelihood = -689.57824  
Iteration 2:   log likelihood = -689.57505  
Iteration 3:   log likelihood = -689.57505  

Ordered logistic regression                     Number of obs     =        504
                                                LR chi2(1)        =       8.11
                                                Prob > chi2       =     0.0044
Log likelihood = -689.57505                     Pseudo R2         =     0.0058

------------------------------------------------------------------------------------------
              awarewaste |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
               treatment |
Leaflet Treatment Group  |   .4609564   .1623425     2.84   0.005     .1427709    .7791419
-------------------------+----------------------------------------------------------------
                   /cut1 |  -.4493223   .1211244                     -.6867218   -.2119229
                   /cut2 |   .8543491   .1252889                      .6087873    1.099911
                   /cut3 |   2.346133    .168949                      2.014999    2.677267
                   /cut4 |    3.80845   .2865533                      3.246816    4.370084
------------------------------------------------------------------------------------------

Individuals who received treatment were 1.46 times more likely to be aware of the effect of coffee production on environmental issues than those in the control group.

2.2.3 Covariate Balance Test

Let’s test covariate balance

foreach v of varlist sex mumedu dadedu bulgnationality drinkcoffee citysofia {
  tab awarewaste `v', chi2
}

Awareness of waste |
 production due to |          sex
 coffee production |         0          1 |     Total
-------------------+----------------------+----------
         Not Aware |        94         77 |       171 
Slightly Not Aware |       102         53 |       155 
           Neither |        65         55 |       120 
    Slightly Aware |        22         19 |        41 
       Fully Aware |         3         11 |        14 
-------------------+----------------------+----------
             Total |       286        215 |       501 

          Pearson chi2(4) =  13.0039   Pr = 0.011

Awareness of waste |
 production due to |              mumedu
 coffee production |         1          2          3 |     Total
-------------------+---------------------------------+----------
         Not Aware |         3         69         88 |       160 
Slightly Not Aware |         5         48         98 |       151 
           Neither |         1         31         76 |       108 
    Slightly Aware |         1          9         27 |        37 
       Fully Aware |         0          3          9 |        12 
-------------------+---------------------------------+----------
             Total |        10        160        298 |       468 

          Pearson chi2(8) =  11.8017   Pr = 0.160

Awareness of waste |
 production due to |              dadedu
 coffee production |         1          2          3 |     Total
-------------------+---------------------------------+----------
         Not Aware |         4         67         83 |       154 
Slightly Not Aware |         2         53         87 |       142 
           Neither |         0         42         62 |       104 
    Slightly Aware |         0          9         30 |        39 
       Fully Aware |         0          3          8 |        11 
-------------------+---------------------------------+----------
             Total |         6        174        270 |       450 

          Pearson chi2(8) =  10.9594   Pr = 0.204

Awareness of waste |
 production due to |    bulgnationality
 coffee production |         0          1 |     Total
-------------------+----------------------+----------
         Not Aware |         4        167 |       171 
Slightly Not Aware |         5        152 |       157 
           Neither |         2        119 |       121 
    Slightly Aware |         0         41 |        41 
       Fully Aware |         0         14 |        14 
-------------------+----------------------+----------
             Total |        11        493 |       504 

          Pearson chi2(4) =   2.1444   Pr = 0.709

Awareness of waste |
 production due to |                      drinkcoffee
 coffee production |         1          2          3          4          5 |     Total
-------------------+-------------------------------------------------------+----------
         Not Aware |        68         18         21         32         31 |       170 
Slightly Not Aware |        39         25         32         27         34 |       157 
           Neither |        46         19         19         21         14 |       119 
    Slightly Aware |        12          5          4         12          8 |        41 
       Fully Aware |         4          0          4          3          2 |        13 
-------------------+-------------------------------------------------------+----------
             Total |       169         67         80         95         89 |       500 

         Pearson chi2(16) =  23.6155   Pr = 0.098

Awareness of waste |
 production due to |       citysofia
 coffee production |         0          1 |     Total
-------------------+----------------------+----------
         Not Aware |        46        125 |       171 
Slightly Not Aware |        40        117 |       157 
           Neither |        27         94 |       121 
    Slightly Aware |         7         34 |        41 
       Fully Aware |         2         12 |        14 
-------------------+----------------------+----------
             Total |       122        382 |       504 

          Pearson chi2(4) =   2.9391   Pr = 0.568

From our covariate balance test, we have a slight concern for imbalance in sex. However, all other characteristics were no correlated with treatment status.

est clear
quietly {
eststo m1: ologit awarewaste i.treatment, or
eststo m2: ologit awarewaste i.treatment i.mumedu i.sex, or
eststo m3: ologit awarewaste i.treatment i.drinkcoffee i.sex i.mumedu i.dadedu i.bulgnationality i.citysofia, or
}

esttab m1 m2 m3, mtitle("Model 1" "Model 2" "Model 3") keep(1.treatment)

                      (1)             (2)             (3)   
                  Model 1         Model 2         Model 3   
------------------------------------------------------------
awarewaste                                                  
1.treatment         0.461**         0.566***        0.583** 
                   (2.84)          (3.32)          (3.27)   
------------------------------------------------------------
N                     504             466             434   
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Our estimates range from an odd ratio of 1.46 to 1.58. Notice that as we add covariates, our sample size is dropping, which means not all observations provided characteristic information.