Chapter 2 Experimental Research Design
We can estimate the \(ATE\) using experimental research designs. If the experiments were well-designed and well-implemented experiments, then we can use the ttest command to calculate the simple difference in outcomes or \(SDO\)
2.1 Job Corps (2001) Evaluation
The Deparment of Labor wanted an evaluation of the Job Corps program. The Job Corps program serves disadvantaged youth to improve employment outcomes. Individuals were randomly assigned to Job Corps or out of Job Corps due to limited slots.
2.1.1 Inspect the Data
Let’s inspect our outcome variable and treatment variable
cd "/Users/Sam/Desktop/Econ 672/Data"
use JC, clear
describe earny4
describe assignment
sum earny4
tab assignment/Users/Sam/Desktop/Econ 672/Data
(Written by R. )
storage display value
variable name type format label variable label
----------------------------------------------------------------------------------------------------------------------
earny4 double %9.0g Earnings 4 years after assignment
storage display value
variable name type format label variable label
----------------------------------------------------------------------------------------------------------------------
assignment double %34.0g assignment1
Random assignment to Job Corps
Earnings 4 years after assignment
-------------------------------------------------------------
Percentiles Smallest
1% 0 0
5% 0 0
10% 0 0 Obs 9,240
25% 43.50582 0 Sum of Wgt. 9,240
50% 181.7103 Mean 207.6163
Largest Std. Dev. 194.5507
75% 307.3206 1821.865
90% 442.1254 1859.233 Variance 37849.99
95% 544.7895 2357.738 Skewness 1.890325
99% 835.3426 2409.909 Kurtosis 12.32882
Random assignment to Job Corps | Freq. Percent Cum.
-----------------------------------+-----------------------------------
Randomly assigned out of Job Corps | 3,663 39.64 39.64
Randomly assigned to Job Corps | 5,577 60.36 100.00
-----------------------------------+-----------------------------------
Total | 9,240 100.00
Our outcome variable shows that the median earnings is $181.71, while the mean is $207.62.
Our treatment variable has 5,577 assigned to treatment and 3,663 assigned to the control group.
2.1.2 Estimate the impact
We can estimate the simple differnce in outcomes with the ttest command.
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Randomly | 3,663 197.9258 3.072182 185.9368 191.9025 203.9492
Randomly | 5,577 213.981 2.675004 199.7675 208.7369 219.225
---------+--------------------------------------------------------------------
combined | 9,240 207.6163 2.023936 194.5507 203.6489 211.5836
---------+--------------------------------------------------------------------
diff | -16.05513 4.134466 -24.15959 -7.950661
------------------------------------------------------------------------------
diff = mean(Randomly) - mean(Randomly) t = -3.8832
Ho: diff = 0 degrees of freedom = 9238
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0001 Pr(|T| > |t|) = 0.0001 Pr(T > t) = 0.9999
Next, let’s run an OLS regression with and without robust standard errors
Source | SS df MS Number of obs = 9,240
-------------+---------------------------------- F(1, 9238) = 15.08
Model | 569892.708 1 569892.708 Prob > F = 0.0001
Residual | 349126136 9,238 37792.3941 R-squared = 0.0016
-------------+---------------------------------- Adj R-squared = 0.0015
Total | 349696029 9,239 37849.9869 Root MSE = 194.4
-------------------------------------------------------------------------------------------------
earny4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------------------+----------------------------------------------------------------
assignment |
Randomly assigned to Job Corps | 16.05513 4.134466 3.88 0.000 7.950661 24.15959
_cons | 197.9258 3.212061 61.62 0.000 191.6295 204.2222
-------------------------------------------------------------------------------------------------
Linear regression Number of obs = 9,240
F(1, 9238) = 15.53
Prob > F = 0.0001
R-squared = 0.0016
Root MSE = 194.4
-------------------------------------------------------------------------------------------------
| Robust
earny4 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------------------+----------------------------------------------------------------
assignment |
Randomly assigned to Job Corps | 16.05513 4.073534 3.94 0.000 8.070101 24.04015
_cons | 197.9258 3.072095 64.43 0.000 191.9039 203.9478
-------------------------------------------------------------------------------------------------
We see simple difference in outcomes is about 16 dollars higher, which is what we see in the Schochet, P. Z., Burghardt, J., Glazerman, S. (2001): “National Job Corps study: The impacts of Job Corps on participants’ employment and related outcomes”, Mathematica Policy Research, Washington, DC.
2.1.3 Covariate Balance Test and Specification Tests
Next, let’s test the is there are any covariates associated with the treatment after randomization
ttest age, by(assignment)
ttest educ, by(assignment)
ttest educmum, by(assignment)
ttest educdad, by(assignment)Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Randomly | 3,663 18.33825 .034554 2.091302 18.2705 18.40599
Randomly | 5,577 18.50099 .0291468 2.17666 18.44385 18.55813
---------+--------------------------------------------------------------------
combined | 9,240 18.43647 .0223105 2.144592 18.39274 18.48021
---------+--------------------------------------------------------------------
diff | -.1627389 .0455812 -.2520881 -.0733896
------------------------------------------------------------------------------
diff = mean(Randomly) - mean(Randomly) t = -3.5703
Ho: diff = 0 degrees of freedom = 9238
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0002 Pr(|T| > |t|) = 0.0004 Pr(T > t) = 0.9998
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Randomly | 3,663 9.924379 .0327304 1.980934 9.860207 9.988551
Randomly | 5,577 9.977228 .0270215 2.017948 9.924255 10.0302
---------+--------------------------------------------------------------------
combined | 9,240 9.956277 .0208418 2.003416 9.915423 9.997132
---------+--------------------------------------------------------------------
diff | -.052849 .0426065 -.1363671 .0306691
------------------------------------------------------------------------------
diff = mean(Randomly) - mean(Randomly) t = -1.2404
Ho: diff = 0 degrees of freedom = 9238
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.1074 Pr(|T| > |t|) = 0.2149 Pr(T > t) = 0.8926
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Randomly | 3,663 9.280917 .084987 5.143648 9.114291 9.447544
Randomly | 5,577 9.335664 .0675206 5.042389 9.203298 9.468031
---------+--------------------------------------------------------------------
combined | 9,240 9.313961 .0528746 5.082565 9.210315 9.417607
---------+--------------------------------------------------------------------
diff | -.0547471 .108098 -.266643 .1571489
------------------------------------------------------------------------------
diff = mean(Randomly) - mean(Randomly) t = -0.5065
Ho: diff = 0 degrees of freedom = 9238
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.3063 Pr(|T| > |t|) = 0.6125 Pr(T > t) = 0.6937
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Randomly | 3,663 7.019383 .1004599 6.080105 6.82242 7.216346
Randomly | 5,577 7.04662 .0803195 5.998205 6.889163 7.204078
---------+--------------------------------------------------------------------
combined | 9,240 7.035823 .062736 6.030492 6.912846 7.158799
---------+--------------------------------------------------------------------
diff | -.027237 .1282603 -.2786556 .2241816
------------------------------------------------------------------------------
diff = mean(Randomly) - mean(Randomly) t = -0.2124
Ho: diff = 0 degrees of freedom = 9238
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.4159 Pr(|T| > |t|) = 0.8318 Pr(T > t) = 0.5841
We fail to reject the null hypothesis that there are differences between treatment and control among the covariates of age, education attained, mother’s education, and father’s education.
tab black assignment, chi2
tab female assignment, chi2
tab hispanic assignment, chi2
tab cohabmarried assignment, chi2
tab haschild assignment, chi2Individual |
is Black | Random assignment to
or African | Job Corps
American | Randomly Randomly | Total
-----------+----------------------+----------
Non-Black | 1,873 2,810 | 4,683
Black | 1,790 2,767 | 4,557
-----------+----------------------+----------
Total | 3,663 5,577 | 9,240
Pearson chi2(1) = 0.4941 Pr = 0.482
| Random assignment to
Individual | Job Corps
is female | Randomly Randomly | Total
-----------+----------------------+----------
Male | 2,220 2,960 | 5,180
Female | 1,443 2,617 | 4,060
-----------+----------------------+----------
Total | 3,663 5,577 | 9,240
Pearson chi2(1) = 50.9039 Pr = 0.000
| Random assignment to
Individual is | Job Corps
Latino/Hispanic | Randomly Randomly | Total
----------------+----------------------+----------
Non-Latino | 3,024 4,641 | 7,665
Latino/Hispanic | 639 936 | 1,575
----------------+----------------------+----------
Total | 3,663 5,577 | 9,240
Pearson chi2(1) = 0.6842 Pr = 0.408
Individual is Married | Random assignment to
or Cohabiting at | Job Corps
assignment | Randomly Randomly | Total
----------------------+----------------------+----------
Not Married or cohabi | 3,441 5,233 | 8,674
Married or Cohabiting | 222 344 | 566
----------------------+----------------------+----------
Total | 3,663 5,577 | 9,240
Pearson chi2(1) = 0.0445 Pr = 0.833
Individual has child | Random assignment to
or children at | Job Corps
assignment | Randomly Randomly | Total
----------------------+----------------------+----------
No chilren | 2,981 4,417 | 7,398
Has at least one chil | 682 1,160 | 1,842
----------------------+----------------------+----------
Total | 3,663 5,577 | 9,240
Pearson chi2(1) = 6.5895 Pr = 0.010
Individual |
has ever | Random assignment to
worked at | Job Corps
assignment | Randomly Randomly | Total
-------------+----------------------+----------
Never worked | 3,155 4,767 | 7,922
Has worked | 508 810 | 1,318
-------------+----------------------+----------
Total | 3,663 5,577 | 9,240
Pearson chi2(1) = 0.7768 Pr = 0.378
We see that we fail to reject the null hypothesis that there is a difference between treatment and control for individuals for Black, Latino/Hispanic, Married/Cohabiting, and Ever Worked. However, we reject the null hypothesis that there is no difference between treatment and control for being female or having any children.
Should we be concerned? Yes!
Let’s do a sensitivity tests. The first model will be the simple difference, while the second model will include the covariates we rejected the null hypothesis. The third model will include all covariates.
While there is some concern for a lack of covariate balance, the full data set utilizes weights which we do not have in the public use data.
*What happens to the standard errors
est clear
eststo m1: quietly reg earny4 i.assignment
eststo m2: quietly reg earny4 i.assignment i.female i.haschild
eststo m3: quietly reg earny4 i.assignment i.female i.black i.hispanic i.cohabmarried i.haschild i.everwkd age educ educmum educdad
esttab m1 m2 m3, mtitle("Model 1" "Model 2" "Model 3") keep(1.assignment) (1) (2) (3)
Model 1 Model 2 Model 3
------------------------------------------------------------
1.assignment 16.06*** 21.07*** 19.59***
(3.88) (5.16) (4.96)
------------------------------------------------------------
N 9240 9240 9240
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
We see that our ATE increase estimates range from 16 dollars to 21 dollars.
2.2 Leaftlet Information Evaluation
Next we will look at an impact evaluation of an information leaflet about coffee production on students’ awareness. The target population is Bulgarian high school and high school students and the outcome is the awareness of environmental issues due to coffee production.
2.2.1 Inspect the Data
Let’s inspect the data by running some descriptive statistics. Our dependent variable is awarewaste, while treatment is a leaflet treatment is randomly assigned to students.
/Users/Sam/Desktop/Econ 672/Data
(Written by R. )
Awareness of waste |
production due to |
coffee production | Freq. Percent Cum.
-------------------+-----------------------------------
Not Aware | 171 33.93 33.93
Slightly Not Aware | 157 31.15 65.08
Neither | 121 24.01 89.09
Slightly Aware | 41 8.13 97.22
Fully Aware | 14 2.78 100.00
-------------------+-----------------------------------
Total | 504 100.00
Treatment Status | Freq. Percent Cum.
------------------------+-----------------------------------
Control Group | 261 50.00 50.00
Leaflet Treatment Group | 261 50.00 100.00
------------------------+-----------------------------------
Total | 522 100.00
2.2.2 Estimate the impact
Let’s run the simple difference in outcomes with ttest command.
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Control | 254 2.007874 .0628675 1.001943 1.884064 2.131684
Leaflet | 250 2.288 .0702564 1.110852 2.149627 2.426373
---------+--------------------------------------------------------------------
combined | 504 2.146825 .0474646 1.065578 2.053572 2.240079
---------+--------------------------------------------------------------------
diff | -.280126 .0942007 -.4652021 -.0950498
------------------------------------------------------------------------------
diff = mean(Control) - mean(Leaflet) t = -2.9737
Ho: diff = 0 degrees of freedom = 502
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0015 Pr(|T| > |t|) = 0.0031 Pr(T > t) = 0.9985
We reject the null hypothesis that the leaflet treatment had no impact, and it increases awareness by 0.28.
Let’s run an OLS regression.
Source | SS df MS Number of obs = 504
-------------+---------------------------------- F(1, 502) = 8.84
Model | 9.88666867 1 9.88666867 Prob > F = 0.0031
Residual | 561.248252 502 1.11802441 R-squared = 0.0173
-------------+---------------------------------- Adj R-squared = 0.0154
Total | 571.134921 503 1.1354571 Root MSE = 1.0574
------------------------------------------------------------------------------------------
awarewaste | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
treatment |
Leaflet Treatment Group | .280126 .0942007 2.97 0.003 .0950498 .4652021
_cons | 2.007874 .0663451 30.26 0.000 1.877526 2.138222
------------------------------------------------------------------------------------------
Wait a second! What does 0.28 mean? What is a potential problem here?
Look at the outcome variable again. It is an ordinal variable! Let’s use a \(\chi^2\) test
Awareness of waste |
production due to | Treatment Status
coffee production | Control G Leaflet T | Total
-------------------+----------------------+----------
Not Aware | 97 74 | 171
Slightly Not Aware | 82 75 | 157
Neither | 56 65 | 121
Slightly Aware | 14 27 | 41
Fully Aware | 5 9 | 14
-------------------+----------------------+----------
Total | 254 250 | 504
Pearson chi2(4) = 9.3087 Pr = 0.054
Let’s run an ordinal logit regression.
Iteration 0: log likelihood = -693.62947
Iteration 1: log likelihood = -689.57824
Iteration 2: log likelihood = -689.57505
Iteration 3: log likelihood = -689.57505
Ordered logistic regression Number of obs = 504
LR chi2(1) = 8.11
Prob > chi2 = 0.0044
Log likelihood = -689.57505 Pseudo R2 = 0.0058
------------------------------------------------------------------------------------------
awarewaste | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
treatment |
Leaflet Treatment Group | .4609564 .1623425 2.84 0.005 .1427709 .7791419
-------------------------+----------------------------------------------------------------
/cut1 | -.4493223 .1211244 -.6867218 -.2119229
/cut2 | .8543491 .1252889 .6087873 1.099911
/cut3 | 2.346133 .168949 2.014999 2.677267
/cut4 | 3.80845 .2865533 3.246816 4.370084
------------------------------------------------------------------------------------------
Individuals who received treatment were 1.46 times more likely to be aware of the effect of coffee production on environmental issues than those in the control group.
2.2.3 Covariate Balance Test
Let’s test covariate balance
foreach v of varlist sex mumedu dadedu bulgnationality drinkcoffee citysofia {
tab awarewaste `v', chi2
}Awareness of waste |
production due to | sex
coffee production | 0 1 | Total
-------------------+----------------------+----------
Not Aware | 94 77 | 171
Slightly Not Aware | 102 53 | 155
Neither | 65 55 | 120
Slightly Aware | 22 19 | 41
Fully Aware | 3 11 | 14
-------------------+----------------------+----------
Total | 286 215 | 501
Pearson chi2(4) = 13.0039 Pr = 0.011
Awareness of waste |
production due to | mumedu
coffee production | 1 2 3 | Total
-------------------+---------------------------------+----------
Not Aware | 3 69 88 | 160
Slightly Not Aware | 5 48 98 | 151
Neither | 1 31 76 | 108
Slightly Aware | 1 9 27 | 37
Fully Aware | 0 3 9 | 12
-------------------+---------------------------------+----------
Total | 10 160 298 | 468
Pearson chi2(8) = 11.8017 Pr = 0.160
Awareness of waste |
production due to | dadedu
coffee production | 1 2 3 | Total
-------------------+---------------------------------+----------
Not Aware | 4 67 83 | 154
Slightly Not Aware | 2 53 87 | 142
Neither | 0 42 62 | 104
Slightly Aware | 0 9 30 | 39
Fully Aware | 0 3 8 | 11
-------------------+---------------------------------+----------
Total | 6 174 270 | 450
Pearson chi2(8) = 10.9594 Pr = 0.204
Awareness of waste |
production due to | bulgnationality
coffee production | 0 1 | Total
-------------------+----------------------+----------
Not Aware | 4 167 | 171
Slightly Not Aware | 5 152 | 157
Neither | 2 119 | 121
Slightly Aware | 0 41 | 41
Fully Aware | 0 14 | 14
-------------------+----------------------+----------
Total | 11 493 | 504
Pearson chi2(4) = 2.1444 Pr = 0.709
Awareness of waste |
production due to | drinkcoffee
coffee production | 1 2 3 4 5 | Total
-------------------+-------------------------------------------------------+----------
Not Aware | 68 18 21 32 31 | 170
Slightly Not Aware | 39 25 32 27 34 | 157
Neither | 46 19 19 21 14 | 119
Slightly Aware | 12 5 4 12 8 | 41
Fully Aware | 4 0 4 3 2 | 13
-------------------+-------------------------------------------------------+----------
Total | 169 67 80 95 89 | 500
Pearson chi2(16) = 23.6155 Pr = 0.098
Awareness of waste |
production due to | citysofia
coffee production | 0 1 | Total
-------------------+----------------------+----------
Not Aware | 46 125 | 171
Slightly Not Aware | 40 117 | 157
Neither | 27 94 | 121
Slightly Aware | 7 34 | 41
Fully Aware | 2 12 | 14
-------------------+----------------------+----------
Total | 122 382 | 504
Pearson chi2(4) = 2.9391 Pr = 0.568
From our covariate balance test, we have a slight concern for imbalance in sex. However, all other characteristics were no correlated with treatment status.
est clear
quietly {
eststo m1: ologit awarewaste i.treatment, or
eststo m2: ologit awarewaste i.treatment i.mumedu i.sex, or
eststo m3: ologit awarewaste i.treatment i.drinkcoffee i.sex i.mumedu i.dadedu i.bulgnationality i.citysofia, or
}
esttab m1 m2 m3, mtitle("Model 1" "Model 2" "Model 3") keep(1.treatment) (1) (2) (3)
Model 1 Model 2 Model 3
------------------------------------------------------------
awarewaste
1.treatment 0.461** 0.566*** 0.583**
(2.84) (3.32) (3.27)
------------------------------------------------------------
N 504 466 434
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Our estimates range from an odd ratio of 1.46 to 1.58. Notice that as we add covariates, our sample size is dropping, which means not all observations provided characteristic information.