Chapter 1 Instrumental Variables

Go to elms and download the mroz dta file.

1.1 Estimating Returns to Education for Married Women

We’ll use the data from A. Mroz (1987), “The Sensitivity of an Empirical Model of Married Women’s Hours of Work to Economic and Statistical Assumptions,” Econometrica 55, 765-799.

We’ll use the data on married working women to estimate the return to education using a simple OLS model.

Our likely biased OLS model results in the following

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "mroz.dta", clear
reg lwage educ
/Users/Sam/Desktop/Econ 645/Data/Wooldridge

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(1, 426)       =     56.93
       Model |  26.3264193         1  26.3264193   Prob > F        =    0.0000
    Residual |  197.001022       426  .462443713   R-squared       =    0.1179
-------------+----------------------------------   Adj R-squared   =    0.1158
       Total |  223.327441       427  .523015084   Root MSE        =    .68003

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .1086487   .0143998     7.55   0.000     .0803451    .1369523
       _cons |  -.1851968   .1852259    -1.00   0.318    -.5492673    .1788736
------------------------------------------------------------------------------

Our estimate implies that the returns to education is exp(.109)-1*100 about 11.5%. Notice that we are using stored values for \(\hat{\beta}_{edu}\) with _b[edu].

display (exp(_b[educ])-1)*100
11.477061

We will use father’s education as an instrument for the observation’s level of education if the women is in the labor force. We will use the predict command to get estimates of \(\hat{x}\)

reg educ fathedu if inlf==1
predict edu_hat
      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(1, 426)       =     88.84
       Model |  384.841983         1  384.841983   Prob > F        =    0.0000
    Residual |  1845.35428       426  4.33181756   R-squared       =    0.1726
-------------+----------------------------------   Adj R-squared   =    0.1706
       Total |  2230.19626       427  5.22294206   Root MSE        =    2.0813

------------------------------------------------------------------------------
        educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    fatheduc |   .2694416   .0285863     9.43   0.000     .2132538    .3256295
       _cons |   10.23705   .2759363    37.10   0.000     9.694685    10.77942
------------------------------------------------------------------------------

(option xb assumed; fitted values)

For instrument relevance, let’s obtain the F-Statistic after regressing education onto father’s education.

test fathedu
 ( 1)  fatheduc = 0

       F(  1,   426) =   88.84
            Prob > F =    0.0000

Our F-test shows that the instrument is greater than \(F-stat > 15\), so it seem like a relevant candidate for an instrument. This does not mean it is a good instrument, though.

Father’s Education as an instrument

reg lwage edu_hat
      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(1, 426)       =      2.59
       Model |  1.34752449         1  1.34752449   Prob > F        =    0.1086
    Residual |  221.979916       426  .521079616   R-squared       =    0.0060
-------------+----------------------------------   Adj R-squared   =    0.0037
       Total |  223.327441       427  .523015084   Root MSE        =    .72186

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     edu_hat |   .0591735   .0367969     1.61   0.109    -.0131525    .1314995
       _cons |   .4411034   .4671121     0.94   0.346    -.4770279    1.359235
------------------------------------------------------------------------------

One additional year of education increases wages by \[ (e^{(0.059)}-1)*100\%=6.1\% \]

display (exp(_b[edu_hat])-1)*100
6.095928

Question? The F-test passed the instrument relevance \(F-stat > 15\), but what about instrument exogeneity?

1.2 Exercise: Estimating Returns to Education for Men

Let’s estimate the returns to education for men. We’ll use data from M. Blackburn and D. Neumark (1992), “Unobserved Ability, Efficiency Wages, and Interindustry Wage Differentials,” Quarterly Journal of Economics 107, 1421-1436.

Use number of siblings as an instrument to predict an observation’s level of education. We’ll keep it simple and have no other covariates.

  1. Estimate the potentially biased OLS
  2. Estimate your first stage \(\hat{x}\)
  3. Test your first stage
  4. Run your second regression using \(\hat{x}\) from your first regression
  1. Estimate the potential biased OLS
cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "wage2.dta", clear
reg lwage educ
/Users/Sam/Desktop/Econ 645/Data/Wooldridge

      Source |       SS           df       MS      Number of obs   =       935
-------------+----------------------------------   F(1, 933)       =    100.70
       Model |  16.1377042         1  16.1377042   Prob > F        =    0.0000
    Residual |  149.518579       933  .160255712   R-squared       =    0.0974
-------------+----------------------------------   Adj R-squared   =    0.0964
       Total |  165.656283       934  .177362188   Root MSE        =    .40032

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0598392   .0059631    10.03   0.000     .0481366    .0715418
       _cons |   5.973063   .0813737    73.40   0.000     5.813366    6.132759
------------------------------------------------------------------------------

We have a estimated return of education of \[ (e^{.0598392}-1)*100\% = 6.2\% \]

display (exp(_b[educ])-1)*100
6.1665821
  1. Use Number of siblings as an instrument for the first stage and predict \(\hat{x}\)
reg educ sibs
predict edu_hat
      Source |       SS           df       MS      Number of obs   =       935
-------------+----------------------------------   F(1, 933)       =     56.67
       Model |  258.055048         1  258.055048   Prob > F        =    0.0000
    Residual |   4248.7642       933  4.55387374   R-squared       =    0.0573
-------------+----------------------------------   Adj R-squared   =    0.0562
       Total |  4506.81925       934  4.82528828   Root MSE        =     2.134

------------------------------------------------------------------------------
        educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        sibs |  -.2279164   .0302768    -7.53   0.000     -.287335   -.1684979
       _cons |   14.13879   .1131382   124.97   0.000     13.91676    14.36083
------------------------------------------------------------------------------

(option xb assumed; fitted values)
  1. Get F-statistic
test sibs
 ( 1)  sibs = 0

       F(  1,   933) =   56.67
            Prob > F =    0.0000
  1. Number of siblings as an instrument
reg lwage edu_hat
      Source |       SS           df       MS      Number of obs   =       935
-------------+----------------------------------   F(1, 933)       =     22.31
       Model |  3.86818074         1  3.86818074   Prob > F        =    0.0000
    Residual |  161.788103       933  .173406326   R-squared       =    0.0234
-------------+----------------------------------   Adj R-squared   =    0.0223
       Total |  165.656283       934  .177362188   Root MSE        =    .41642

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     edu_hat |   .1224326   .0259225     4.72   0.000     .0715595    .1733057
       _cons |   5.130026   .3494009    14.68   0.000     4.444323    5.815729
------------------------------------------------------------------------------

One additional year of education increases wages by \[ (e^{0.1224}-1)*100\% =13.0\% \]

display (exp(_b[edu_hat])-1)*100 
13.024298

Another thing that is interesting here is the the OLS estimate is biased downward, which is not what we would expect.

Possible reasons: 1. Siblings could be correlated with ability - more siblings, less partental attention which could result in lower ability. 2. The OLS estimator is downward biased due to measurement error in educ, but this is less likely to satisfy the classic error-in-variables (CEM) assumption.

1.3 Smoking on Birthweight

It is important to see an example of a poor instrument. We’ll use data from . Mullahy (1997), “Instrumental-Variable Estimation of Count Data Models: Applications to Models of Cigarette Smoking Behavior,” Review of Economics and Statistics 79, 596-593.

The biased OLS regression looks at the natural log of birth weight and cigarette packs smoked per day by the mother. We would expect that smoking is correlated with unobserved health and parental decisions, so it is like biased due to unobserved confounders.

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "bwght.dta", clear
reg lbwght packs
/Users/Sam/Desktop/Econ 645/Data/Wooldridge

      Source |       SS           df       MS      Number of obs   =     1,388
-------------+----------------------------------   F(1, 1386)      =     27.98
       Model |  .997781141         1  .997781141   Prob > F        =    0.0000
    Residual |  49.4225525     1,386  .035658407   R-squared       =    0.0198
-------------+----------------------------------   Adj R-squared   =    0.0191
       Total |  50.4203336     1,387  .036352079   Root MSE        =    .18883

------------------------------------------------------------------------------
      lbwght |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       packs |  -.0898131   .0169786    -5.29   0.000    -.1231197   -.0565065
       _cons |   4.769404   .0053694   888.26   0.000     4.758871    4.779937
------------------------------------------------------------------------------

An additional package smoked per day decreases birthweight by \[ (e^{-.08981}-1)*100% \approx-8.6% \]

display (exp(-.08981)-1)*100
-8.5895151

We’ll use cigarette prices as an instrument for cigarette packs smoked per day. We assume that cigarette prices and the error term u are uncorrelated (instrument exogeneity). Note that some states fund health care with cigarette tax revenue. We will use cigarette price and quantity of packs smoked should be negatively correlated.

reg packs cigprice
      Source |       SS           df       MS      Number of obs   =     1,388
-------------+----------------------------------   F(1, 1386)      =      0.13
       Model |  .011648626         1  .011648626   Prob > F        =    0.7179
    Residual |  123.684481     1,386  .089238442   R-squared       =    0.0001
-------------+----------------------------------   Adj R-squared   =   -0.0006
       Total |  123.696129     1,387  .089182501   Root MSE        =    .29873

------------------------------------------------------------------------------
       packs |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    cigprice |   .0002829    .000783     0.36   0.718    -.0012531    .0018188
       _cons |   .0674257   .1025384     0.66   0.511    -.1337215    .2685728
------------------------------------------------------------------------------

The result shows that we fail to result the null hypothesis that \(\beta_{cigprice}\) is equal to 0. From theory, we know that something is wrong, since our instrument, price of cigarettes, is not associated with packs of cigarettes consumed.

We will still check instrument revelance:

test cigprice
 ( 1)  cigprice = 0

       F(  1,  1386) =    0.13
            Prob > F =    0.7179

Our instrumennt fails the F-test - we have a weak instrument.

Cigarette price as an instrument for packs smoked is a poor instrument and our use of predicted packs smoked is in the wrong direction.

predict packs_hat
reg lbwght packs_hat
(option xb assumed; fitted values)

      Source |       SS           df       MS      Number of obs   =     1,388
-------------+----------------------------------   F(1, 1386)      =      2.87
       Model |  .104047659         1  .104047659   Prob > F        =    0.0907
    Residual |   50.316286     1,386  .036303237   R-squared       =    0.0021
-------------+----------------------------------   Adj R-squared   =    0.0013
       Total |  50.4203336     1,387  .036352079   Root MSE        =    .19053

------------------------------------------------------------------------------
      lbwght |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   packs_hat |   2.988676   1.765368     1.69   0.091    -.4744067    6.451758
       _cons |   4.448136   .1843027    24.13   0.000     4.086594    4.809679
------------------------------------------------------------------------------

Our results show that something is terribly wrong. An additional package of cigarette is correlated with an massive increase in birth weight, which is not supported empirically. Our biased OLS was a better model than our poor instrument model.

We can and should always test Instrument Relevance. If we have a poor instrument, we should go back to the drawing board.

Another issue here is that price is a poor instrument, since price and quantity are simultaneously determined. We would need a second set of instruments on price to estimate our first-stage.

1.4 Estimating Returns to Education for Married Women Part 2

We’ll use Mroz data again on working women. We’ll use both parent’s education as instruments to identify the effect of education on wages for working women. We overidentify the endogenous variable with two instruments: father’s education and mother’s education.

Potentially Biased OLS

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "mroz.dta", clear
reg lwage educ c.exper##c.exper
/Users/Sam/Desktop/Econ 645/Data/Wooldridge

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(3, 424)       =     26.29
       Model |  35.0222967         3  11.6740989   Prob > F        =    0.0000
    Residual |  188.305144       424  .444115906   R-squared       =    0.1568
-------------+----------------------------------   Adj R-squared   =    0.1509
       Total |  223.327441       427  .523015084   Root MSE        =    .66642

---------------------------------------------------------------------------------
          lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           educ |   .1074896   .0141465     7.60   0.000     .0796837    .1352956
          exper |   .0415665   .0131752     3.15   0.002     .0156697    .0674633
                |
c.exper#c.exper |  -.0008112   .0003932    -2.06   0.040    -.0015841   -.0000382
                |
          _cons |  -.5220406   .1986321    -2.63   0.009    -.9124667   -.1316144
---------------------------------------------------------------------------------

Our result is \[ (e^{.1074896}-1)*100\% = 11.3\% \]

display (exp(_b[educ])-1)*100
11.347933

We’ll use two instruments for one endogenous variable, which will be parent’s education as an instrument for women in the labor force.

reg educ c.exper##c.exper fathedu mothedu if inlf==1
      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(4, 423)       =     28.36
       Model |  471.620998         4   117.90525   Prob > F        =    0.0000
    Residual |  1758.57526       423  4.15738833   R-squared       =    0.2115
-------------+----------------------------------   Adj R-squared   =    0.2040
       Total |  2230.19626       427  5.22294206   Root MSE        =     2.039

---------------------------------------------------------------------------------
           educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
          exper |   .0452254   .0402507     1.12   0.262    -.0338909    .1243417
                |
c.exper#c.exper |  -.0010091   .0012033    -0.84   0.402    -.0033744    .0013562
                |
       fatheduc |   .1895484   .0337565     5.62   0.000     .1231971    .2558997
       motheduc |    .157597   .0358941     4.39   0.000      .087044    .2281501
          _cons |    9.10264   .4265614    21.34   0.000     8.264196    9.941084
---------------------------------------------------------------------------------

Get the F-Statistic.

test fathedu mothedu
 ( 1)  fatheduc = 0
 ( 2)  motheduc = 0

       F(  2,   423) =   55.40
            Prob > F =    0.0000

The result shows that the instruments are potential candidates for good instruments, since \(F>15\).

Using Father’s Education and Mother’s Education as an instrument.

predict edu_hat
reg lwage edu_hat c.exper##c.exper
(option xb assumed; fitted values)

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(3, 424)       =      7.40
       Model |   11.117828         3  3.70594265   Prob > F        =    0.0001
    Residual |  212.209613       424   .50049437   R-squared       =    0.0498
-------------+----------------------------------   Adj R-squared   =    0.0431
       Total |  223.327441       427  .523015084   Root MSE        =    .70746

---------------------------------------------------------------------------------
          lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
        edu_hat |   .0613966   .0329624     1.86   0.063    -.0033933    .1261866
          exper |   .0441704   .0140844     3.14   0.002     .0164865    .0718543
                |
c.exper#c.exper |   -.000899   .0004212    -2.13   0.033    -.0017268   -.0000711
                |
          _cons |   .0481003   .4197565     0.11   0.909    -.7769624     .873163
---------------------------------------------------------------------------------

One additional year of education increases wages by \[ (e^{0.061}-1)*100\%=6.3\% \]

This can be more easily done with our ivregress 2sls command:

ivregress 2sls (=) \(x_1, x_2...,x_k\)

ivregress 2sls lwage (educ=fathedu mothedu) c.exper##c.exper
display (exp(_b[educ])-1)*100
Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      24.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

---------------------------------------------------------------------------------
          lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           educ |   .0613966   .0312895     1.96   0.050     .0000704    .1227228
          exper |   .0441704   .0133696     3.30   0.001     .0179665    .0703742
                |
c.exper#c.exper |   -.000899   .0003998    -2.25   0.025    -.0016826   -.0001154
                |
          _cons |   .0481003    .398453     0.12   0.904    -.7328532    .8290538
---------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper c.exper#c.exper fatheduc motheduc

6.3320574