Chapter 2 Testing Assumptions
2.1 Testing for Endogeneity - Returns to Education for Working Women
We’ll keep using Mroz’s data on working women.
Use father’s and mother’s education as an instrument to estimate education with the ivregress 2sls command.
cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "mroz.dta", clear
ivregress 2sls lwage (educ=fatheduc motheduc) c.exper##c.exper/Users/Sam/Desktop/Econ 645/Data/Wooldridge
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(3) = 24.65
Prob > chi2 = 0.0000
R-squared = 0.1357
Root MSE = .67155
---------------------------------------------------------------------------------
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
educ | .0613966 .0312895 1.96 0.050 .0000704 .1227228
exper | .0441704 .0133696 3.30 0.001 .0179665 .0703742
|
c.exper#c.exper | -.000899 .0003998 -2.25 0.025 -.0016826 -.0001154
|
_cons | .0481003 .398453 0.12 0.904 -.7328532 .8290538
---------------------------------------------------------------------------------
Instrumented: educ
Instruments: exper c.exper#c.exper fatheduc motheduc
If want to see if our explanatory variable of interest is potentially endogenous with the error term, then we can conduct an endogeneity test. There are two ways to test for potential endogeneity in the OLS model
- Manually
- estat postestimation command
2.1.1 Manually Test
To calculate the test manually, first we estimate the reduced form for \(\hat{x}_{edu}\) by regressing all exogenous variables onto \(x_{edu}\) by including all other \(x_{i}\) in the structural model and the additional IVs \(z_i\).
First, estimate the reduced form equation (first stage).
Source | SS df MS Number of obs = 428
-------------+---------------------------------- F(4, 423) = 28.36
Model | 471.620998 4 117.90525 Prob > F = 0.0000
Residual | 1758.57526 423 4.15738833 R-squared = 0.2115
-------------+---------------------------------- Adj R-squared = 0.2040
Total | 2230.19626 427 5.22294206 Root MSE = 2.039
---------------------------------------------------------------------------------
educ | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
exper | .0452254 .0402507 1.12 0.262 -.0338909 .1243417
|
c.exper#c.exper | -.0010091 .0012033 -0.84 0.402 -.0033744 .0013562
|
fatheduc | .1895484 .0337565 5.62 0.000 .1231971 .2558997
motheduc | .157597 .0358941 4.39 0.000 .087044 .2281501
_cons | 9.10264 .4265614 21.34 0.000 8.264196 9.941084
---------------------------------------------------------------------------------
Next, obtains residuals \(\hat{v}_2\).
Then, add \(\hat{v}_2\) to the structural equation (our OLS model)
Source | SS df MS Number of obs = 428
-------------+---------------------------------- F(4, 423) = 20.50
Model | 36.2573098 4 9.06432744 Prob > F = 0.0000
Residual | 187.070131 423 .442246173 R-squared = 0.1624
-------------+---------------------------------- Adj R-squared = 0.1544
Total | 223.327441 427 .523015084 Root MSE = .66502
---------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
educ | .0613966 .0309849 1.98 0.048 .000493 .1223003
exper | .0441704 .0132394 3.34 0.001 .0181471 .0701937
|
c.exper#c.exper | -.000899 .0003959 -2.27 0.024 -.0016772 -.0001208
|
v | .0581666 .0348073 1.67 0.095 -.0102502 .1265834
_cons | .0481003 .3945753 0.12 0.903 -.7274721 .8236727
---------------------------------------------------------------------------------
There is possible evidence of endogeneity since \(p < .1\) but \(p > .05\). You should report IV and OLS.
eststo m1: quietly reg lwage educ c.exper##c.exper
eststo m2: quietly ivregress 2sls lwage (educ=fatheduc motheduc) c.exper##c.exper
esttab m1 m2, mtitle(OLS IV) (1) (2)
OLS IV
--------------------------------------------
educ 0.107*** 0.0614*
(7.60) (1.96)
exper 0.0416** 0.0442***
(3.15) (3.30)
c.exper#c.~r -0.000811* -0.000899*
(-2.06) (-2.25)
_cons -0.522** 0.0481
(-2.63) (0.12)
--------------------------------------------
N 428 428
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
2.1.2 Using estat endogenous
We can also use a postestimation command estat endogenous.
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(3) = 24.65
Prob > chi2 = 0.0000
R-squared = 0.1357
Root MSE = .67155
---------------------------------------------------------------------------------
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
educ | .0613966 .0312895 1.96 0.050 .0000704 .1227228
exper | .0441704 .0133696 3.30 0.001 .0179665 .0703742
|
c.exper#c.exper | -.000899 .0003998 -2.25 0.025 -.0016826 -.0001154
|
_cons | .0481003 .398453 0.12 0.904 -.7328532 .8290538
---------------------------------------------------------------------------------
Instrumented: educ
Instruments: exper c.exper#c.exper fatheduc motheduc
Tests of endogeneity
Ho: variables are exogenous
Durbin (score) chi2(1) = 2.80707 (p = 0.0938)
Wu-Hausman F(1,423) = 2.79259 (p = 0.0954)
We have weak evidence that education is endogenous, since \(p<.1\) but \(p >.05\). However, we know from theory that ability is a confounder on wages and education. We should report both OLS and IV models.
2.2 Testing Overidentifying Restrictions
We will testing overidentifying restrictions using the data from “Returns to Education for Working Women”. When we have one IV for one endogenous explanatory variable, we have a just identified equation. When we have two instruments and one endogenous explanatory variable we have overidentification.
When we have multiple IVs, we can test to see some of our instruments are correlated with the structural error term.
We can estimate two 2sls models (one for each IV) and then compare them. They should only differ by the sampling error. If our two beta coefficients on our fitted explanatory variable of interest are different then, we conclude that at least one instrument or maybe both is/are correlated with the structural error term.
When we add too many instrumental variables (or overidentification), we can increase the efficiency of the 2SLS estimator. However, we may run the risk of violating the instrument exogeneity assumption.
When we use motheredu and fatheredu as IVS for education, we have a single overidentification restriction. We have two IVs and 1 endogenous explanatory variable. We have two ways to calculate this test: manually and estat.
2.2.1 Manually Test
Use an 2SLS model with mother’s education and father’s education as two IVs.
cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "mroz.dta", clear
ivregress 2sls lwage (educ=motheduc fatheduc) c.exper##c.exper/Users/Sam/Desktop/Econ 645/Data/Wooldridge
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(3) = 24.65
Prob > chi2 = 0.0000
R-squared = 0.1357
Root MSE = .67155
---------------------------------------------------------------------------------
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
educ | .0613966 .0312895 1.96 0.050 .0000704 .1227228
exper | .0441704 .0133696 3.30 0.001 .0179665 .0703742
|
c.exper#c.exper | -.000899 .0003998 -2.25 0.025 -.0016826 -.0001154
|
_cons | .0481003 .398453 0.12 0.904 -.7328532 .8290538
---------------------------------------------------------------------------------
Instrumented: educ
Instruments: exper c.exper#c.exper motheduc fatheduc
Get our residuals \(r\) ane regress \(r\) onto all exogenous variables.
(325 missing values generated)
Source | SS df MS Number of obs = 428
-------------+---------------------------------- F(4, 423) = 0.09
Model | .170503122 4 .04262578 Prob > F = 0.9845
Residual | 192.849512 423 .455909012 R-squared = 0.0009
-------------+---------------------------------- Adj R-squared = -0.0086
Total | 193.020015 427 .452037506 Root MSE = .67521
---------------------------------------------------------------------------------
r | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
motheduc | -.0066065 .0118864 -0.56 0.579 -.0299704 .0167573
fatheduc | .0057823 .0111786 0.52 0.605 -.0161902 .0277547
exper | -.0000183 .0133291 -0.00 0.999 -.0262179 .0261813
|
c.exper#c.exper | 7.34e-07 .0003985 0.00 0.999 -.0007825 .000784
|
_cons | .0109641 .1412571 0.08 0.938 -.2666892 .2886173
---------------------------------------------------------------------------------
Next, obtain \(R^2\) and \(N\)
ereturn list
local N=`e(N)'
display "`N'"
local rsq=`e(r2)'
display "`rsq'"
local nR=`N'*`rsq'
display "`nR'"scalars:
e(N) = 428
e(df_m) = 4
e(df_r) = 423
e(F) = .0934962445404771
e(r2) = .000883344256925
e(rmse) = .6752103466059415
e(mss) = .1705031219578643
e(rss) = 192.8495121452517
e(r2_a) = -.0085645673813073
e(ll) = -436.7021015142834
e(ll_0) = -436.891220726253
e(rank) = 5
macros:
e(cmdline) : "regress r mothedu fathedu c.exper##c.exper"
e(title) : "Linear regression"
e(marginsok) : "XB default"
e(vce) : "ols"
e(depvar) : "r"
e(cmd) : "regress"
e(properties) : "b V"
e(predict) : "regres_p"
e(estat_cmd) : "regress_estat"
matrices:
e(b) : 1 x 5
e(V) : 5 x 5
functions:
e(sample)
428
.000883344256925
.3780713419639
Under the null hypothesis that all IVs are uncorrelated with \(u_1\) \(NR^2 \thicksim \chi^2_{q}\), where \(q\) is the number of instruments from outside the model minus the total number of endogenous explanatory variables. If \(N R^2\) exceeds the 5% critical value in \(\chi^2_q\), then we reject the null hypothesis and conclude that at least some of the IVs are not exogenous.
Here we have \(q=2-1=1\) df for the chi-squared test and we fail to reject the null hypothesis since \(N R^2=0.37807\) and \(\chi^2_1\) at the 5% critical value is 3.841.
2.2.2 Estat overid
We can also use the postestimation command of estat overid.
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(3) = 24.65
Prob > chi2 = 0.0000
R-squared = 0.1357
Root MSE = .67155
---------------------------------------------------------------------------------
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
educ | .0613966 .0312895 1.96 0.050 .0000704 .1227228
exper | .0441704 .0133696 3.30 0.001 .0179665 .0703742
|
c.exper#c.exper | -.000899 .0003998 -2.25 0.025 -.0016826 -.0001154
|
_cons | .0481003 .398453 0.12 0.904 -.7328532 .8290538
---------------------------------------------------------------------------------
Instrumented: educ
Instruments: exper c.exper#c.exper motheduc fatheduc
Tests of overidentifying restrictions:
Sargan (score) chi2(1) = .378071 (p = 0.5386)
Basmann chi2(1) = .373985 (p = 0.5408)
We get our \(N R^2\) with this postestimation command.
Let’s add husband’s education, so we have 2 overidentification restrictions.
Instrumental variables (2SLS) regression Number of obs = 428
Wald chi2(3) = 34.90
Prob > chi2 = 0.0000
R-squared = 0.1495
Root MSE = .66616
---------------------------------------------------------------------------------
lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
educ | .0803918 .021672 3.71 0.000 .0379155 .1228681
exper | .0430973 .0132027 3.26 0.001 .0172204 .0689742
|
c.exper#c.exper | -.0008628 .0003943 -2.19 0.029 -.0016357 -.0000899
|
_cons | -.1868572 .2840591 -0.66 0.511 -.7436029 .3698885
---------------------------------------------------------------------------------
Instrumented: educ
Instruments: exper c.exper#c.exper motheduc fatheduc huseduc
Tests of overidentifying restrictions:
Sargan (score) chi2(2) = 1.11504 (p = 0.5726)
Basmann chi2(2) = 1.10228 (p = 0.5763)
Notice that we still fail reject the null hypothesis, so we might consider adding it as an IV. Also, notice that the coefficient and standard error around education have changed as well.
It is a good idea to report both in a sensitivity analysis.
est clear
eststo m1: quietly ivregress 2sls lwage (educ=motheduc fatheduc) c.exper##c.exper
eststo m2: quietly ivregress 2sls lwage (educ=motheduc fatheduc huseduc) c.exper##c.exper
esttab m1 m2, mtitle(2IVs 3IVs) (1) (2) (3)
OLS 2IVs 3IVs
------------------------------------------------------------
educ 0.107*** 0.0614* 0.0804***
(7.60) (1.96) (3.71)
exper 0.0416** 0.0442*** 0.0431**
(3.15) (3.30) (3.26)
c.exper#c.~r -0.000811* -0.000899* -0.000863*
(-2.06) (-2.25) (-2.19)
_cons -0.522** 0.0481 -0.187
(-2.63) (0.12) (-0.66)
------------------------------------------------------------
N 428 428 428
------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001