Chapter 3 Sample Selection Correction

Lesson: Similar to tobit, when we don’t account for the truncation of the data, or why certain parts of the population are not sampled, we will commit a type of omitted variable bias without our lambda(zy)-hat. We can use a Heckit method for sample selection correction.

We want to see if there is sample selection bias due to unobservable wage offers for non-working women.

We need to estimate a logit or probit to test and correct for sample selection bias due to unobserved wage offer for nonworking women. Spousal income, education, experience, age, and number of kids less than 6. \[ ln(wages_{i})=\beta_{0}+ \mathbf{x'\beta} + u_{i} \] Where \(\mathbf{x}\) is a vector that includes education, experience, and experience squared
We use the Heckit command to implement a Heckman Method for sample selection correction.

\[ ln(wages_{i})=\beta_{0}+ \mathbf{x'\beta} +\mathbf{z'\delta} + u_{i} \] Where \(\mathbf{z}\) is a vector that includes spousal income, education, experience, age, and number of kids less than 6.

3.1 OLS

use "/Users/Sam/Desktop/Econ 645/Data/Wooldridge/mroz.dta", clear
reg lwage educ exper expersq

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(3, 424)       =     26.29
       Model |  35.0222967         3  11.6740989   Prob > F        =    0.0000
    Residual |  188.305144       424  .444115906   R-squared       =    0.1568
-------------+----------------------------------   Adj R-squared   =    0.1509
       Total |  223.327441       427  .523015084   Root MSE        =    .66642

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .1074896   .0141465     7.60   0.000     .0796837    .1352956
       exper |   .0415665   .0131752     3.15   0.002     .0156697    .0674633
     expersq |  -.0008112   .0003932    -2.06   0.040    -.0015841   -.0000382
       _cons |  -.5220406   .1986321    -2.63   0.009    -.9124667   -.1316144
------------------------------------------------------------------------------

Notice: an assumption is used to exclude spousal income, age, kids less than 6, and kids greater than 6 from our main regression.

display (exp(_b[educ])-1)*100

11.347933

3.2 Heckman Method - Heckit

We will use a subset of all exogenous variable 1. What are the factors correlated with being in the labor force? 2. What is the impact of education and experience on wages

use "/Users/Sam/Desktop/Econ 645/Data/Wooldridge/mroz.dta", clear
heckman lwage educ exper expersq, select(inlf=nwifeinc educ exper expersq age kidslt6 kidsge6) twostep

Heckman selection model -- two-step estimates   Number of obs     =        753
(regression model with sample selection)        Censored obs      =        325
                                                Uncensored obs    =        428

                                                Wald chi2(3)      =      51.53
                                                Prob > chi2       =     0.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwage        |
        educ |   .1090655    .015523     7.03   0.000     .0786411      .13949
       exper |   .0438873   .0162611     2.70   0.007     .0120163    .0757584
     expersq |  -.0008591   .0004389    -1.96   0.050    -.0017194    1.15e-06
       _cons |  -.5781032   .3050062    -1.90   0.058    -1.175904     .019698
-------------+----------------------------------------------------------------
inlf         |
    nwifeinc |  -.0120237   .0048398    -2.48   0.013    -.0215096   -.0025378
        educ |   .1309047   .0252542     5.18   0.000     .0814074     .180402
       exper |   .1233476   .0187164     6.59   0.000     .0866641    .1600311
     expersq |  -.0018871      .0006    -3.15   0.002     -.003063   -.0007111
         age |  -.0528527   .0084772    -6.23   0.000    -.0694678   -.0362376
     kidslt6 |  -.8683285   .1185223    -7.33   0.000    -1.100628    -.636029
     kidsge6 |    .036005   .0434768     0.83   0.408     -.049208    .1212179
       _cons |   .2700768    .508593     0.53   0.595    -.7267473    1.266901
-------------+----------------------------------------------------------------
mills        |
      lambda |   .0322619   .1336246     0.24   0.809    -.2296376    .2941613
-------------+----------------------------------------------------------------
         rho |    0.04861
       sigma |  .66362875
------------------------------------------------------------------------------

est clear
eststo OLS: reg lwage educ exper expersq    
eststo Heckman: heckman lwage educ exper expersq, select(inlf=nwifeinc educ exper expersq age kidslt6 kidsge6) twostep 
esttab, mtitle

                      (1)             (2)   
                      OLS         Heckman   
--------------------------------------------
main                                        
educ                0.107***        0.109***
                   (7.60)          (7.03)   

exper              0.0416**        0.0439** 
                   (3.15)          (2.70)   

expersq         -0.000811*      -0.000859   
                  (-2.06)         (-1.96)   

_cons              -0.522**        -0.578   
                  (-2.63)         (-1.90)   
--------------------------------------------
inlf                                        
nwifeinc                          -0.0120*  
                                  (-2.48)   

educ                                0.131***
                                   (5.18)   

exper                               0.123***
                                   (6.59)   

expersq                          -0.00189** 
                                  (-3.15)   

age                               -0.0529***
                                  (-6.23)   

kidslt6                            -0.868***
                                  (-7.33)   

kidsge6                            0.0360   
                                   (0.83)   

_cons                               0.270   
                                   (0.53)   
--------------------------------------------
mills                                       
lambda                             0.0323   
                                   (0.24)   
--------------------------------------------
N                     428             753   
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

There is no real evidence of sample selection bias in the wage offer equation. Our lambda-hat is not statistically significant and we fail to reject \(H_0: \rho=0\). Also, we notice very little difference between our OLS and Heckman Method.

Interpretation: we will have a similar interpretation to our OLS. We have a log-linear model so our interpretation of the return would be \(e^\beta\) or \((e^\beta-1)*100\). Our \(\hat{\lambda}\) would be a potential omitted variable, but our example shows that we fail to reject the null hypothesis and selection bias is does not appear to be problematic.