Chapter 1 Instrumental Variables
Go to elms and download the mroz dta file.
1.1 Estimating Returns to Education for Married Women
We’ll use the data from A. Mroz (1987), “The Sensitivity of an Empirical Model of Married Women’s Hours of Work to Economic and Statistical Assumptions,” Econometrica 55, 765-799.
We’ll use the data on married working women to estimate the return to education using a simple OLS model.
Our likely biased OLS model results in the following
/Users/Sam/Desktop/Econ 645/Data/Wooldridge
Source | SS df MS Number of obs = 428
-------------+---------------------------------- F(1, 426) = 56.93
Model | 26.3264193 1 26.3264193 Prob > F = 0.0000
Residual | 197.001022 426 .462443713 R-squared = 0.1179
-------------+---------------------------------- Adj R-squared = 0.1158
Total | 223.327441 427 .523015084 Root MSE = .68003
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1086487 .0143998 7.55 0.000 .0803451 .1369523
_cons | -.1851968 .1852259 -1.00 0.318 -.5492673 .1788736
------------------------------------------------------------------------------
Our estimate implies that the returns to education is exp(.109)-1*100 about 11.5%. Notice that we are using stored values for \(\hat{\beta}_{edu}\) with _b[edu].
11.477061
We will use father’s education as an instrument for the observation’s level of education if the women is in the labor force. We will use the predict command to get estimates of \(\hat{x}\)
Source | SS df MS Number of obs = 428
-------------+---------------------------------- F(1, 426) = 88.84
Model | 384.841983 1 384.841983 Prob > F = 0.0000
Residual | 1845.35428 426 4.33181756 R-squared = 0.1726
-------------+---------------------------------- Adj R-squared = 0.1706
Total | 2230.19626 427 5.22294206 Root MSE = 2.0813
------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fatheduc | .2694416 .0285863 9.43 0.000 .2132538 .3256295
_cons | 10.23705 .2759363 37.10 0.000 9.694685 10.77942
------------------------------------------------------------------------------
(option xb assumed; fitted values)
For instrument relevance, let’s obtain the F-Statistic after regressing education onto father’s education.
( 1) fatheduc = 0
F( 1, 426) = 88.84
Prob > F = 0.0000
Our F-test shows that the instrument is greater than \(F-stat > 15\), so it seem like a relevant candidate for an instrument. This does not mean it is a good instrument, though.
Father’s Education as an instrument
Source | SS df MS Number of obs = 428
-------------+---------------------------------- F(1, 426) = 2.59
Model | 1.34752449 1 1.34752449 Prob > F = 0.1086
Residual | 221.979916 426 .521079616 R-squared = 0.0060
-------------+---------------------------------- Adj R-squared = 0.0037
Total | 223.327441 427 .523015084 Root MSE = .72186
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
edu_hat | .0591735 .0367969 1.61 0.109 -.0131525 .1314995
_cons | .4411034 .4671121 0.94 0.346 -.4770279 1.359235
------------------------------------------------------------------------------
One additional year of education increases wages by \[ (e^{(0.059)}-1)*100\%=6.1\% \]
6.095928
Question? The F-test passed the instrument relevance \(F-stat > 15\), but what about instrument exogeneity?
1.2 Exercise: Estimating Returns to Education for Men
Let’s estimate the returns to education for men. We’ll use data from M. Blackburn and D. Neumark (1992), “Unobserved Ability, Efficiency Wages, and Interindustry Wage Differentials,” Quarterly Journal of Economics 107, 1421-1436.
Use number of siblings as an instrument to predict an observation’s level of education. We’ll keep it simple and have no other covariates.
- Estimate the potentially biased OLS
- Estimate your first stage \(\hat{x}\)
- Test your first stage
- Run your second regression using \(\hat{x}\) from your first regression
- Estimate the potential biased OLS
/Users/Sam/Desktop/Econ 645/Data/Wooldridge
Source | SS df MS Number of obs = 935
-------------+---------------------------------- F(1, 933) = 100.70
Model | 16.1377042 1 16.1377042 Prob > F = 0.0000
Residual | 149.518579 933 .160255712 R-squared = 0.0974
-------------+---------------------------------- Adj R-squared = 0.0964
Total | 165.656283 934 .177362188 Root MSE = .40032
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .0598392 .0059631 10.03 0.000 .0481366 .0715418
_cons | 5.973063 .0813737 73.40 0.000 5.813366 6.132759
------------------------------------------------------------------------------
We have a estimated return of education of \[ (e^{.0598392}-1)*100\% = 6.2\% \]
6.1665821
- Use Number of siblings as an instrument for the first stage and predict \(\hat{x}\)
Source | SS df MS Number of obs = 935
-------------+---------------------------------- F(1, 933) = 56.67
Model | 258.055048 1 258.055048 Prob > F = 0.0000
Residual | 4248.7642 933 4.55387374 R-squared = 0.0573
-------------+---------------------------------- Adj R-squared = 0.0562
Total | 4506.81925 934 4.82528828 Root MSE = 2.134
------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sibs | -.2279164 .0302768 -7.53 0.000 -.287335 -.1684979
_cons | 14.13879 .1131382 124.97 0.000 13.91676 14.36083
------------------------------------------------------------------------------
(option xb assumed; fitted values)
- Get F-statistic
( 1) sibs = 0
F( 1, 933) = 56.67
Prob > F = 0.0000
- Number of siblings as an instrument
Source | SS df MS Number of obs = 935
-------------+---------------------------------- F(1, 933) = 22.31
Model | 3.86818074 1 3.86818074 Prob > F = 0.0000
Residual | 161.788103 933 .173406326 R-squared = 0.0234
-------------+---------------------------------- Adj R-squared = 0.0223
Total | 165.656283 934 .177362188 Root MSE = .41642
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
edu_hat | .1224326 .0259225 4.72 0.000 .0715595 .1733057
_cons | 5.130026 .3494009 14.68 0.000 4.444323 5.815729
------------------------------------------------------------------------------
One additional year of education increases wages by \[ (e^{0.1224}-1)*100\% =13.0\% \]
13.024298
Another thing that is interesting here is the the OLS estimate is biased downward, which is not what we would expect.
Possible reasons: 1. Siblings could be correlated with ability - more siblings, less partental attention which could result in lower ability. 2. The OLS estimator is downward biased due to measurement error in educ, but this is less likely to satisfy the classic error-in-variables (CEM) assumption.
1.3 Smoking on Birthweight
It is important to see an example of a poor instrument. We’ll use data from . Mullahy (1997), “Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior,” Review of Economics and Statistics 79, 596-593.
The biased OLS regression looks at the natural log of birth weight and cigarette packs smoked per day by the mother. We would expect that smoking is correlated with unobserved health and parental decisions, so it is like biased due to unobserved confounders.
/Users/Sam/Desktop/Econ 645/Data/Wooldridge
Source | SS df MS Number of obs = 1,388
-------------+---------------------------------- F(1, 1386) = 27.98
Model | .997781141 1 .997781141 Prob > F = 0.0000
Residual | 49.4225525 1,386 .035658407 R-squared = 0.0198
-------------+---------------------------------- Adj R-squared = 0.0191
Total | 50.4203336 1,387 .036352079 Root MSE = .18883
------------------------------------------------------------------------------
lbwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
packs | -.0898131 .0169786 -5.29 0.000 -.1231197 -.0565065
_cons | 4.769404 .0053694 888.26 0.000 4.758871 4.779937
------------------------------------------------------------------------------
An additional package smoked per day decreases birthweight by \[ (e^{-.08981}-1)*100% \approx-8.6% \]
-8.5895151
We’ll use cigarette prices as an instrument for cigarette packs smoked per day. We assume that cigarette prices and the error term u are uncorrelated (instrument exogeneity). Note that some states fund health care with cigarette tax revenue. We will use cigarette price and quantity of packs smoked should be negatively correlated.
Source | SS df MS Number of obs = 1,388
-------------+---------------------------------- F(1, 1386) = 0.13
Model | .011648626 1 .011648626 Prob > F = 0.7179
Residual | 123.684481 1,386 .089238442 R-squared = 0.0001
-------------+---------------------------------- Adj R-squared = -0.0006
Total | 123.696129 1,387 .089182501 Root MSE = .29873
------------------------------------------------------------------------------
packs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigprice | .0002829 .000783 0.36 0.718 -.0012531 .0018188
_cons | .0674257 .1025384 0.66 0.511 -.1337215 .2685728
------------------------------------------------------------------------------
The result shows that we fail to result the null hypothesis that \(\beta_{cigprice}\) is equal to 0. From theory, we know that something is wrong, since our instrument, price of cigarettes, is not associated with packs of cigarettes consumed.
We will still check instrument revelance:
( 1) cigprice = 0
F( 1, 1386) = 0.13
Prob > F = 0.7179
Our instrumennt fails the F-test - we have a weak instrument.
Cigarette price as an instrument for packs smoked is a poor instrument and our use of predicted packs smoked is in the wrong direction.
(option xb assumed; fitted values)
Source | SS df MS Number of obs = 1,388
-------------+---------------------------------- F(1, 1386) = 2.87
Model | .104047659 1 .104047659 Prob > F = 0.0907
Residual | 50.316286 1,386 .036303237 R-squared = 0.0021
-------------+---------------------------------- Adj R-squared = 0.0013
Total | 50.4203336 1,387 .036352079 Root MSE = .19053
------------------------------------------------------------------------------
lbwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
packs_hat | 2.988676 1.765368 1.69 0.091 -.4744067 6.451758
_cons | 4.448136 .1843027 24.13 0.000 4.086594 4.809679
------------------------------------------------------------------------------
Our results show that something is terribly wrong. An additional package of cigarette is correlated with an massive increase in birth weight, which is not supported empirically. Our biased OLS was a better model than our poor instrument model.
We can and should always test Instrument Relevance. If we have a poor instrument, we should go back to the drawing board.
Another issue here is that price is a poor instrument, since price and quantity are simultaneously determined. We would need a second set of instruments on price to estimate our first-stage.
1.4 Estimating Returns to Education for Married Women Part 2
We’ll use Mroz data again on working women. We’ll use both parent’s education as instruments to identify the effect of education on wages for working women. We overidentify the endogenous variable with two instruments: father’s education and mother’s education.
Potentially Biased OLS
cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "mroz.dta", clear
reg lwage educ c.exper##c.exper/Users/Sam/Desktop/Econ 645/Data/Wooldridge
Source | SS df MS Number of obs = 428
-------------+---------------------------------- F(3, 424) = 26.29
Model | 35.0222967 3 11.6740989 Prob > F = 0.0000
Residual | 188.305144 424 .444115906 R-squared = 0.1568
-------------+---------------------------------- Adj R-squared = 0.1509
Total | 223.327441 427 .523015084 Root MSE = .66642
---------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
educ | .1074896 .0141465 7.60 0.000 .0796837 .1352956
exper | .0415665 .0131752 3.15 0.002 .0156697 .0674633
|
c.exper#c.exper | -.0008112 .0003932 -2.06 0.040 -.0015841 -.0000382
|
_cons | -.5220406 .1986321 -2.63 0.009 -.9124667 -.1316144
---------------------------------------------------------------------------------
Our result is \[ (e^{.1074896}-1)*100\% = 11.3\% \]
11.347933
We’ll use two instruments for one endogenous variable, which will be parent’s education as an instrument for women in the labor force.
Source | SS df MS Number of obs = 428
-------------+---------------------------------- F(4, 423) = 28.36
Model | 471.620998 4 117.90525 Prob > F = 0.0000
Residual | 1758.57526 423 4.15738833 R-squared = 0.2115
-------------+---------------------------------- Adj R-squared = 0.2040
Total | 2230.19626 427 5.22294206 Root MSE = 2.039
---------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
exper | .0452254 .0402507 1.12 0.262 -.0338909 .1243417
|
c.exper#c.exper | -.0010091 .0012033 -0.84 0.402 -.0033744 .0013562
|
fatheduc | .1895484 .0337565 5.62 0.000 .1231971 .2558997
motheduc | .157597 .0358941 4.39 0.000 .087044 .2281501
_cons | 9.10264 .4265614 21.34 0.000 8.264196 9.941084
---------------------------------------------------------------------------------
Get the F-Statistic.
( 1) fatheduc = 0
( 2) motheduc = 0
F( 2, 423) = 55.40
Prob > F = 0.0000
The result shows that the instruments are potential candidates for good instruments, since \(F>15\).
Using Father’s Education and Mother’s Education as an instrument.
(option xb assumed; fitted values)
Source | SS df MS Number of obs = 428
-------------+---------------------------------- F(3, 424) = 7.40
Model | 11.117828 3 3.70594265 Prob > F = 0.0001
Residual | 212.209613 424 .50049437 R-squared = 0.0498
-------------+---------------------------------- Adj R-squared = 0.0431
Total | 223.327441 427 .523015084 Root MSE = .70746
---------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
edu_hat | .0613966 .0329624 1.86 0.063 -.0033933 .1261866
exper | .0441704 .0140844 3.14 0.002 .0164865 .0718543
|
c.exper#c.exper | -.000899 .0004212 -2.13 0.033 -.0017268 -.0000711
|
_cons | .0481003 .4197565 0.11 0.909 -.7769624 .873163
---------------------------------------------------------------------------------
One additional year of education increases wages by \[ (e^{0.061}-1)*100\%=6.3\% \]
This can be more easily done with our ivregress 2sls command:
ivregress 2sls
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(3) = 24.65
Prob > chi2 = 0.0000
R-squared = 0.1357
Root MSE = .67155
---------------------------------------------------------------------------------
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
educ | .0613966 .0312895 1.96 0.050 .0000704 .1227228
exper | .0441704 .0133696 3.30 0.001 .0179665 .0703742
|
c.exper#c.exper | -.000899 .0003998 -2.25 0.025 -.0016826 -.0001154
|
_cons | .0481003 .398453 0.12 0.904 -.7328532 .8290538
---------------------------------------------------------------------------------
Instrumented: educ
Instruments: exper c.exper#c.exper fatheduc motheduc
6.3320574