Chapter 2 Testing Assumptions

2.1 Testing for Endogeneity - Returns to Education for Working Women

We’ll keep using Mroz’s data on working women.

Use father’s and mother’s education as an instrument to estimate education with the ivregress 2sls command.

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "mroz.dta", clear
ivregress 2sls lwage (educ=fatheduc motheduc) c.exper##c.exper

/Users/Sam/Desktop/Econ 645/Data/Wooldridge



Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      24.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

---------------------------------------------------------------------------------
          lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           educ |   .0613966   .0312895     1.96   0.050     .0000704    .1227228
          exper |   .0441704   .0133696     3.30   0.001     .0179665    .0703742
                |
c.exper#c.exper |   -.000899   .0003998    -2.25   0.025    -.0016826   -.0001154
                |
          _cons |   .0481003    .398453     0.12   0.904    -.7328532    .8290538
---------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper c.exper#c.exper fatheduc motheduc

If want to see if our explanatory variable of interest is potentially endogenous with the error term, then we can conduct an endogeneity test. There are two ways to test for potential endogeneity in the OLS model

Manually
estat postestimation command

2.1.1 Manually Test

To calculate the test manually, first we estimate the reduced form for \(\hat{x}_{edu}\) by regressing all exogenous variables onto \(x_{edu}\) by including all other \(x_{i}\) in the structural model and the additional IVs \(z_i\).

First, estimate the reduced form equation (first stage).

reg educ c.exper##c.exper fathedu mothedu if inlf==1

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(4, 423)       =     28.36
       Model |  471.620998         4   117.90525   Prob > F        =    0.0000
    Residual |  1758.57526       423  4.15738833   R-squared       =    0.2115
-------------+----------------------------------   Adj R-squared   =    0.2040
       Total |  2230.19626       427  5.22294206   Root MSE        =     2.039

---------------------------------------------------------------------------------
           educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
          exper |   .0452254   .0402507     1.12   0.262    -.0338909    .1243417
                |
c.exper#c.exper |  -.0010091   .0012033    -0.84   0.402    -.0033744    .0013562
                |
       fatheduc |   .1895484   .0337565     5.62   0.000     .1231971    .2558997
       motheduc |    .157597   .0358941     4.39   0.000      .087044    .2281501
          _cons |    9.10264   .4265614    21.34   0.000     8.264196    9.941084
---------------------------------------------------------------------------------

Next, obtains residuals \(\hat{v}_2\).

predict v, residual

Then, add \(\hat{v}_2\) to the structural equation (our OLS model)

reg lwage educ c.exper##c.exper v

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(4, 423)       =     20.50
       Model |  36.2573098         4  9.06432744   Prob > F        =    0.0000
    Residual |  187.070131       423  .442246173   R-squared       =    0.1624
-------------+----------------------------------   Adj R-squared   =    0.1544
       Total |  223.327441       427  .523015084   Root MSE        =    .66502

---------------------------------------------------------------------------------
          lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           educ |   .0613966   .0309849     1.98   0.048      .000493    .1223003
          exper |   .0441704   .0132394     3.34   0.001     .0181471    .0701937
                |
c.exper#c.exper |   -.000899   .0003959    -2.27   0.024    -.0016772   -.0001208
                |
              v |   .0581666   .0348073     1.67   0.095    -.0102502    .1265834
          _cons |   .0481003   .3945753     0.12   0.903    -.7274721    .8236727
---------------------------------------------------------------------------------

There is possible evidence of endogeneity since \(p < .1\) but \(p > .05\). You should report IV and OLS.

eststo m1: quietly reg lwage educ c.exper##c.exper
eststo m2: quietly ivregress 2sls lwage (educ=fatheduc motheduc) c.exper##c.exper
esttab m1 m2, mtitle(OLS IV)

                      (1)             (2)   
                      OLS              IV   
--------------------------------------------
educ                0.107***       0.0614*  
                   (7.60)          (1.96)   

exper              0.0416**        0.0442***
                   (3.15)          (3.30)   

c.exper#c.~r    -0.000811*      -0.000899*  
                  (-2.06)         (-2.25)   

_cons              -0.522**        0.0481   
                  (-2.63)          (0.12)   
--------------------------------------------
N                     428             428   
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

2.1.2 Using estat endogenous

We can also use a postestimation command estat endogenous.

ivregress 2sls lwage (educ=fathedu mothedu) c.exper##c.exper
estat endogenous

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      24.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

---------------------------------------------------------------------------------
          lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           educ |   .0613966   .0312895     1.96   0.050     .0000704    .1227228
          exper |   .0441704   .0133696     3.30   0.001     .0179665    .0703742
                |
c.exper#c.exper |   -.000899   .0003998    -2.25   0.025    -.0016826   -.0001154
                |
          _cons |   .0481003    .398453     0.12   0.904    -.7328532    .8290538
---------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper c.exper#c.exper fatheduc motheduc


  Tests of endogeneity
  Ho: variables are exogenous

  Durbin (score) chi2(1)          =  2.80707  (p = 0.0938)
  Wu-Hausman F(1,423)             =  2.79259  (p = 0.0954)

We have weak evidence that education is endogenous, since \(p<.1\) but \(p >.05\). However, we know from theory that ability is a confounder on wages and education. We should report both OLS and IV models.

2.2 Testing Overidentifying Restrictions

We will testing overidentifying restrictions using the data from “Returns to Education for Working Women”. When we have one IV for one endogenous explanatory variable, we have a just identified equation. When we have two instruments and one endogenous explanatory variable we have overidentification.

When we have multiple IVs, we can test to see some of our instruments are correlated with the structural error term.

We can estimate two 2sls models (one for each IV) and then compare them. They should only differ by the sampling error. If our two beta coefficients on our fitted explanatory variable of interest are different then, we conclude that at least one instrument or maybe both is/are correlated with the structural error term.

When we add too many instrumental variables (or overidentification), we can increase the efficiency of the 2SLS estimator. However, we may run the risk of violating the instrument exogeneity assumption.

When we use motheredu and fatheredu as IVS for education, we have a single overidentification restriction. We have two IVs and 1 endogenous explanatory variable. We have two ways to calculate this test: manually and estat.

2.2.1 Manually Test

Use an 2SLS model with mother’s education and father’s education as two IVs.

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "mroz.dta", clear
ivregress 2sls lwage (educ=motheduc fatheduc) c.exper##c.exper

/Users/Sam/Desktop/Econ 645/Data/Wooldridge



Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      24.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

---------------------------------------------------------------------------------
          lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           educ |   .0613966   .0312895     1.96   0.050     .0000704    .1227228
          exper |   .0441704   .0133696     3.30   0.001     .0179665    .0703742
                |
c.exper#c.exper |   -.000899   .0003998    -2.25   0.025    -.0016826   -.0001154
                |
          _cons |   .0481003    .398453     0.12   0.904    -.7328532    .8290538
---------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper c.exper#c.exper motheduc fatheduc

Get our residuals \(r\) ane regress \(r\) onto all exogenous variables.

predict r, resid
reg r mothedu fathedu c.exper##c.exper

(325 missing values generated)

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(4, 423)       =      0.09
       Model |  .170503122         4   .04262578   Prob > F        =    0.9845
    Residual |  192.849512       423  .455909012   R-squared       =    0.0009
-------------+----------------------------------   Adj R-squared   =   -0.0086
       Total |  193.020015       427  .452037506   Root MSE        =    .67521

---------------------------------------------------------------------------------
              r |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
       motheduc |  -.0066065   .0118864    -0.56   0.579    -.0299704    .0167573
       fatheduc |   .0057823   .0111786     0.52   0.605    -.0161902    .0277547
          exper |  -.0000183   .0133291    -0.00   0.999    -.0262179    .0261813
                |
c.exper#c.exper |   7.34e-07   .0003985     0.00   0.999    -.0007825     .000784
                |
          _cons |   .0109641   .1412571     0.08   0.938    -.2666892    .2886173
---------------------------------------------------------------------------------

Next, obtain \(R^2\) and \(N\)

ereturn list
local N=`e(N)'
display "`N'"
local rsq=`e(r2)'
display "`rsq'"
local nR=`N'*`rsq'
display "`nR'"

scalars:
                  e(N) =  428
               e(df_m) =  4
               e(df_r) =  423
                  e(F) =  .0934962445404771
                 e(r2) =  .000883344256925
               e(rmse) =  .6752103466059415
                e(mss) =  .1705031219578643
                e(rss) =  192.8495121452517
               e(r2_a) =  -.0085645673813073
                 e(ll) =  -436.7021015142834
               e(ll_0) =  -436.891220726253
               e(rank) =  5

macros:
            e(cmdline) : "regress r mothedu fathedu c.exper##c.exper"
              e(title) : "Linear regression"
          e(marginsok) : "XB default"
                e(vce) : "ols"
             e(depvar) : "r"
                e(cmd) : "regress"
         e(properties) : "b V"
            e(predict) : "regres_p"
          e(estat_cmd) : "regress_estat"

matrices:
                  e(b) :  1 x 5
                  e(V) :  5 x 5

functions:
             e(sample)   


428


.000883344256925


.3780713419639

Under the null hypothesis that all IVs are uncorrelated with \(u_1\) \(NR^2 \thicksim \chi^2_{q}\), where \(q\) is the number of instruments from outside the model minus the total number of endogenous explanatory variables. If \(N R^2\) exceeds the 5% critical value in \(\chi^2_q\), then we reject the null hypothesis and conclude that at least some of the IVs are not exogenous.

Here we have \(q=2-1=1\) df for the chi-squared test and we fail to reject the null hypothesis since \(N R^2=0.37807\) and \(\chi^2_1\) at the 5% critical value is 3.841.

2.2.2 Estat overid

We can also use the postestimation command of estat overid.

ivregress 2sls lwage (educ=motheduc fatheduc) c.exper##c.exper
estat overid

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      24.65
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1357
                                                  Root MSE        =     .67155

---------------------------------------------------------------------------------
          lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           educ |   .0613966   .0312895     1.96   0.050     .0000704    .1227228
          exper |   .0441704   .0133696     3.30   0.001     .0179665    .0703742
                |
c.exper#c.exper |   -.000899   .0003998    -2.25   0.025    -.0016826   -.0001154
                |
          _cons |   .0481003    .398453     0.12   0.904    -.7328532    .8290538
---------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper c.exper#c.exper motheduc fatheduc


  Tests of overidentifying restrictions:

  Sargan (score) chi2(1) =  .378071  (p = 0.5386)
  Basmann chi2(1)        =  .373985  (p = 0.5408)

We get our \(N R^2\) with this postestimation command.

Let’s add husband’s education, so we have 2 overidentification restrictions.

ivregress 2sls lwage (educ=motheduc fatheduc huseduc) c.exper##c.exper
estat overid

Instrumental variables (2SLS) regression          Number of obs   =        428
                                                  Wald chi2(3)    =      34.90
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.1495
                                                  Root MSE        =     .66616

---------------------------------------------------------------------------------
          lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           educ |   .0803918    .021672     3.71   0.000     .0379155    .1228681
          exper |   .0430973   .0132027     3.26   0.001     .0172204    .0689742
                |
c.exper#c.exper |  -.0008628   .0003943    -2.19   0.029    -.0016357   -.0000899
                |
          _cons |  -.1868572   .2840591    -0.66   0.511    -.7436029    .3698885
---------------------------------------------------------------------------------
Instrumented:  educ
Instruments:   exper c.exper#c.exper motheduc fatheduc huseduc


  Tests of overidentifying restrictions:

  Sargan (score) chi2(2) =  1.11504  (p = 0.5726)
  Basmann chi2(2)        =  1.10228  (p = 0.5763)

Notice that we still fail reject the null hypothesis, so we might consider adding it as an IV. Also, notice that the coefficient and standard error around education have changed as well.

It is a good idea to report both in a sensitivity analysis.

est clear
eststo m1: quietly ivregress 2sls lwage (educ=motheduc fatheduc) c.exper##c.exper
eststo m2: quietly ivregress 2sls lwage (educ=motheduc fatheduc huseduc) c.exper##c.exper
esttab m1 m2, mtitle(2IVs 3IVs)

                      (1)             (2)             (3)   
                      OLS            2IVs            3IVs   
------------------------------------------------------------
educ                0.107***       0.0614*         0.0804***
                   (7.60)          (1.96)          (3.71)   

exper              0.0416**        0.0442***       0.0431** 
                   (3.15)          (3.30)          (3.26)   

c.exper#c.~r    -0.000811*      -0.000899*      -0.000863*  
                  (-2.06)         (-2.25)         (-2.19)   

_cons              -0.522**        0.0481          -0.187   
                  (-2.63)          (0.12)         (-0.66)   
------------------------------------------------------------
N                     428             428             428   
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001