Chapter 1 Censored Models

1.1 Censored Regression Model - Right-Censoring

Lesson: We can use a Tobit estimator for right-censored model

We need to summarize the dependent variable to see the right-censored value. We know that weekly earnings in the Current Population Survey are top-coded or right-censored at $2884.61. This may bias our estimate, so we’ll compare a OLS model and a Tobit model for right-censored data. Remember a Tobit estimator for a censored model is different from a corner solution.

use "/Users/Sam/Desktop/Econ 645/Data/CPS/jan2024.dta", clear
sum earnings if prerelg==1, detail
histogram earnings if prerelg==1, normal title(Histogram of Weekly Earnings) caption("Source: Current Population Survey")
graph export "/Users/Sam/Desktop/Econ 645/R Markdown/week9_histogram_earnings.png", replace

                  Weekly Earnings: pternwa
-------------------------------------------------------------
      Percentiles      Smallest
 1%           70              0
 5%          225              0
10%          360              0       Obs              10,666
25%          656              0       Sum of Wgt.      10,666

50%         1000                      Mean           1230.474
                        Largest       Std. Dev.      788.1457
75%         1680        2884.61
90%       2692.3        2884.61       Variance       621173.7
95%      2884.61        2884.61       Skewness       .7779644
99%      2884.61        2884.61       Kurtosis       2.648218

(bin=40, start=0, width=72.11525)

(file /Users/Sam/Desktop/Econ 645/R Markdown/week9_histogram_earnings.png written in PNG format)

Histogram of Earnings

Let us take a look at the natural log of earnings

use "/Users/Sam/Desktop/Econ 645/Data/CPS/jan2024.dta", clear
sum lnearnings, detail
histogram lnearnings if prerelg==1, normal title(Histogram of LN Weekly Earnings) caption("Source: Current Population Survey")
graph export "/Users/Sam/Desktop/Econ 645/R Markdown/week9_histogram_lnearnings.png", replace

               Natural Log of Weekly Earnings
-------------------------------------------------------------
      Percentiles      Smallest
 1%     4.356709      -3.506558
 5%     5.420535      -3.506558
10%     5.886104      -3.506558       Obs              10,652
25%      6.49224         .48858       Sum of Wgt.      10,652

50%     6.907755                      Mean            6.86502
                        Largest       Std. Dev.      .8181897
75%     7.426549       7.967145
90%     7.898151       7.967145       Variance       .6694343
95%     7.967145       7.967145       Skewness      -1.791008
99%     7.967145       7.967145       Kurtosis       13.92768

(bin=40, start=-3.5065579, width=.28684257)

(file /Users/Sam/Desktop/Econ 645/R Markdown/week9_histogram_lnearnings.png written in PNG format)

Histogram of LN of Weekly Earnings We’ll estimate the following Mincer Equation. \[ ln(wwages_{i})=\beta_{0} + \beta_{1} edu_{i} + \beta_{2} exp + \beta_{3} exp^2 \beta_{4} marital_{i} + \beta_{5} veteran_{i} + \beta_{6} union_{i} + \beta_{7} female_{i} + \beta_{8} race_{i} + u_{i} \] We’ll need to use the option, ul(right-censored-value) with our Tobit estimator.

sum lnearnings, detail
return list
local maxval `r(max)'
tobit lnearnings i.edu exp expsq i.marital i.veteran i.union i.female i.race, ul(`maxval')

               Natural Log of Weekly Earnings
-------------------------------------------------------------
      Percentiles      Smallest
 1%     4.356709      -3.506558
 5%     5.420535      -3.506558
10%     5.886104      -3.506558       Obs              10,652
25%      6.49224         .48858       Sum of Wgt.      10,652

50%     6.907755                      Mean            6.86502
                        Largest       Std. Dev.      .8181897
75%     7.426549       7.967145
90%     7.898151       7.967145       Variance       .6694343
95%     7.967145       7.967145       Skewness      -1.791008
99%     7.967145       7.967145       Kurtosis       13.92768


scalars:
                  r(N) =  10652
              r(sum_w) =  10652
               r(mean) =  6.865020483352663
                r(Var) =  .6694343494891623
                 r(sd) =  .8181896781854207
           r(skewness) =  -1.791007993978133
           r(kurtosis) =  13.92768307319675
                r(sum) =  73126.19818867257
                r(min) =  -3.506557897319982
                r(max) =  7.967144987828557
                 r(p1) =  4.356708826689592
                 r(p5) =  5.420534999272286
                r(p10) =  5.886104031450156
                r(p25) =  6.492239835020471
                r(p50) =  6.907755278982137
                r(p75) =  7.426549072397305
                r(p90) =  7.898151125863075
                r(p95) =  7.967144987828557
                r(p99) =  7.967144987828557



Tobit regression                                Number of obs     =     10,568
                                                LR chi2(17)       =    3843.62
                                                Prob > chi2       =     0.0000
Log likelihood = -11364.619                     Pseudo R2         =     0.1446

---------------------------------------------------------------------------------------------------------
                             lnearnings |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------------------------+----------------------------------------------------------------
                                    edu |
                                HS/GED  |   .3303562   .0299199    11.04   0.000     .2717075    .3890048
                                    AA  |    .444737   .0351051    12.67   0.000     .3759243    .5135497
                                 BS/BA  |    .831706   .0318545    26.11   0.000     .7692652    .8941468
                              AdDegree  |   1.046186    .034142    30.64   0.000     .9792607     1.11311
                                        |
                                    exp |   .0575487    .002085    27.60   0.000     .0534617    .0616356
                                  expsq |  -.0009113   .0000337   -27.03   0.000    -.0009774   -.0008452
                                        |
                                marital |
            Divorced/Separated/Widowed  |  -.0461827   .0217659    -2.12   0.034    -.0888481   -.0035173
                         Never Married  |   -.129934   .0189855    -6.84   0.000    -.1671492   -.0927189
                                        |
                                veteran |
                               Veteran  |    .042753   .0333981     1.28   0.201    -.0227135    .1082196
                                        |
                                  union |
                                 Union  |   .0448454   .0236468     1.90   0.058    -.0015067    .0911975
                                        |
                                 female |
                                Female  |  -.3347149   .0143948   -23.25   0.000    -.3629314   -.3064983
                                        |
                         race_ethnicity |
                              NH Asian  |   .0550537   .0846217     0.65   0.515    -.1108209    .2209282
                              NH Black  |  -.0576148   .0829774    -0.69   0.487    -.2202662    .1050365
NH Native Hawaiian or Pacific Islander  |   .1314362   .1374275     0.96   0.339    -.1379477      .40082
                  Latino/a or Hispanic  |   .0166878   .0818777     0.20   0.839     -.143808    .1771836
                        NH Multiracial  |   .0902634   .0954831     0.95   0.345    -.0969015    .2774284
                              NH White  |   .0553029   .0803465     0.69   0.491    -.1021914    .2127972
                                        |
                                  _cons |   5.817179   .0897475    64.82   0.000     5.641257    5.993101
----------------------------------------+----------------------------------------------------------------
                                 /sigma |   .7138391   .0052104                      .7036258    .7240524
---------------------------------------------------------------------------------------------------------
             0  left-censored observations
         9,690     uncensored observations
           878 right-censored observations at lnearnings >= 7.967145

We will compare OLS to the Censored Regression Model

est clear
eststo OLS: quietly reg lnearnings i.edu exp expsq i.marital i.veteran i.union i.female i.race
eststo Tobit: quietly tobit lnearnings i.edu exp expsq i.marital i.veteran i.union i.female i.race, ul(`maxval')

esttab OLS Tobit, drop(0.* 1.race* 1.mar* 1.edu) mtitle("OLS" "Tobit")

                      (1)             (2)   
                      OLS           Tobit   
--------------------------------------------
main                                        
2.edu               0.330***        0.330***
                  (11.77)         (11.04)   

3.edu               0.442***        0.445***
                  (13.48)         (12.67)   

4.edu               0.788***        0.832***
                  (26.50)         (26.11)   

5.edu               0.951***        1.046***
                  (30.01)         (30.64)   

exp                0.0558***       0.0575***
                  (28.73)         (27.60)   

expsq           -0.000887***    -0.000911***
                 (-28.27)        (-27.03)   

2.marital         -0.0389         -0.0462*  
                  (-1.92)         (-2.12)   

3.marital          -0.118***       -0.130***
                  (-6.66)         (-6.84)   

1.veteran          0.0412          0.0428   
                   (1.34)          (1.28)   

1.union            0.0670**        0.0448   
                   (3.05)          (1.90)   

1.female           -0.304***       -0.335***
                 (-22.81)        (-23.25)   

2.race_eth~y       0.0325          0.0551   
                   (0.41)          (0.65)   

3.race_eth~y      -0.0466         -0.0576   
                  (-0.60)         (-0.69)   

4.race_eth~y        0.135           0.131   
                   (1.05)          (0.96)   

5.race_eth~y       0.0232          0.0167   
                   (0.30)          (0.20)   

6.race_eth~y       0.0730          0.0903   
                   (0.82)          (0.95)   

7.race_eth~y       0.0549          0.0553   
                   (0.73)          (0.69)   

_cons               5.813***        5.817***
                  (69.47)         (64.82)   
--------------------------------------------
sigma                                       
_cons                               0.714***
                                 (137.00)   
--------------------------------------------
N                   10568           10568   
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Interpretation A very similar interpretation to OLS, but we need normality and homoskedasticity for unbiased estimator. We can use a log-linear interpretation without a scaling factor, so the returns to education would only require $(e^\beta -1)*100$ interpretation.

1.2 Censored Regression Model - Duration analysis

Lesson: We can look at a censored data for duration analysis similar to Wooldridge’s example.

This differs from a top-coded data, which we can use a Tobit analysis that we just saw.

We can look at the duration of time in months between an arrests for inmates in a North Carolina prison after being released from prison. We want to evaluate a work program to see if it is effective in increasing duration before recidivism occurs.

Note: 893 inmates have not been arrested during the period they were followed These observations are censored. The censoring times differed among inmates ranging from 70 to 81 months.

Our dependent variable duration (time in months) is transformed by natural logarithm. We have a bunch of observations that recidivate between 70 and 81 months.

use "/Users/Sam/Desktop/Econ 645/Data/Wooldridge/recid.dta", clear
tab durat

We have a bunch of observations that are censored after 69 months (but not all)

use "/Users/Sam/Desktop/Econ 645/Data/Wooldridge/recid.dta", clear
tab durat cens

           |         cens
     durat |         0          1 |     Total
-----------+----------------------+----------
         1 |         8          0 |         8 
         2 |        15          0 |        15 
         3 |        14          0 |        14 
         4 |        13          0 |        13 
         5 |        16          0 |        16 
         6 |        18          0 |        18 
         7 |        18          0 |        18 
         8 |        16          0 |        16 
         9 |        18          0 |        18 
        10 |        22          0 |        22 
        11 |        11          0 |        11 
        12 |        14          0 |        14 
        13 |        15          0 |        15 
        14 |        16          0 |        16 
        15 |        23          0 |        23 
        16 |        11          0 |        11 
        17 |         9          0 |         9 
        18 |        16          0 |        16 
        19 |         9          0 |         9 
        20 |         8          0 |         8 
        21 |        13          0 |        13 
        22 |         7          0 |         7 
        23 |        16          0 |        16 
        24 |        12          0 |        12 
        25 |        13          0 |        13 
        26 |         8          0 |         8 
        27 |        11          0 |        11 
        28 |         9          0 |         9 
        29 |         8          0 |         8 
        30 |         7          0 |         7 
        31 |         6          0 |         6 
        32 |         6          0 |         6 
        33 |         6          0 |         6 
        34 |         4          0 |         4 
        35 |         5          0 |         5 
        36 |         6          0 |         6 
        37 |         6          0 |         6 
        38 |         4          0 |         4 
        39 |         4          0 |         4 
        40 |         2          0 |         2 
        41 |         7          0 |         7 
        42 |         5          0 |         5 
        43 |         5          0 |         5 
        44 |         4          0 |         4 
        45 |         4          0 |         4 
        46 |         7          0 |         7 
        47 |         4          0 |         4 
        48 |         1          0 |         1 
        49 |         4          0 |         4 
        50 |         5          0 |         5 
        51 |         2          0 |         2 
        52 |         2          0 |         2 
        53 |         8          0 |         8 
        54 |         3          0 |         3 
        55 |         5          0 |         5 
        56 |         2          0 |         2 
        57 |         4          0 |         4 
        58 |         1          0 |         1 
        59 |         5          0 |         5 
        60 |         3          0 |         3 
        62 |         3          0 |         3 
        63 |         2          0 |         2 
        64 |         1          0 |         1 
        65 |         2          0 |         2 
        66 |         3          0 |         3 
        67 |         3          0 |         3 
        68 |         4          0 |         4 
        69 |         2          0 |         2 
        70 |         2        103 |       105 
        71 |         2         88 |        90 
        72 |         1         84 |        85 
        73 |         1        107 |       108 
        74 |         1         71 |        72 
        75 |         0         44 |        44 
        76 |         0        105 |       105 
        77 |         1         60 |        61 
        78 |         0         54 |        54 
        79 |         0         60 |        60 
        80 |         0         69 |        69 
        81 |         0         48 |        48 
-----------+----------------------+----------
     Total |       552        893 |     1,445

We’ll use the stset command and set failure at cens==0. We’ll use the stset command to time set survival.

use "/Users/Sam/Desktop/Econ 645/Data/Wooldridge/recid.dta", clear
stset ldurat, failure(cens==0)

     failure event:  cens == 0
obs. time interval:  (0, ldurat]
 exit on or before:  failure

------------------------------------------------------------------------------
       1445  total observations
          8  observations end on or before enter()
------------------------------------------------------------------------------
       1437  observations remaining, representing
        544  failures in single-record/single-failure data
   5411.742  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =  4.394449

We’ll use the streg command and set the distribution

\[ ldurat_i = \alpha + =\delta workprg_i + \beta_2 tserved_i + \beta_3 felon_i + \beta_4 alcohol_i + \beta_5 drugs_i + \beta_6 educ_i + x'_i \gamma + \varepsilon_i\] Where $x'$ are demographics of race, marital status, and age.

streg workprg priors tserved felon alcohol drugs black married educ age, dist(weibull) nohr

     failure event:  cens == 0
obs. time interval:  (0, ldurat]
 exit on or before:  failure

------------------------------------------------------------------------------
       1445  total observations
          8  observations end on or before enter()
------------------------------------------------------------------------------
       1437  observations remaining, representing
        544  failures in single-record/single-failure data
   5411.742  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =  4.394449

         failure _d:  cens == 0
   analysis time _t:  ldurat

Fitting constant-only model:

Iteration 0:   log likelihood = -1254.5107
Iteration 1:   log likelihood = -1100.1536
Iteration 2:   log likelihood = -1079.1128
Iteration 3:   log likelihood = -1078.7957
Iteration 4:   log likelihood = -1078.7957

Fitting full model:

Iteration 0:   log likelihood = -1078.7957  
Iteration 1:   log likelihood = -1034.1821  
Iteration 2:   log likelihood = -1001.9186  
Iteration 3:   log likelihood = -1000.5996  
Iteration 4:   log likelihood = -1000.5919  
Iteration 5:   log likelihood = -1000.5919  

Weibull regression -- log relative-hazard form 

No. of subjects =        1,437                  Number of obs    =       1,437
No. of failures =          544
Time at risk    =  5411.742317
                                                LR chi2(10)      =      156.41
Log likelihood  =   -1000.5919                  Prob > chi2      =      0.0000

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     workprg |   .0780722    .091463     0.85   0.393    -.1011921    .2573364
      priors |   .0834203   .0139155     5.99   0.000     .0561465    .1106941
     tserved |   .0134545   .0016851     7.98   0.000     .0101519    .0167572
       felon |   -.287139    .106697    -2.69   0.007    -.4962614   -.0780167
     alcohol |   .4399456     .10658     4.13   0.000     .2310527    .6488385
       drugs |   .2920932   .0983695     2.97   0.003     .0992926    .4848938
       black |   .4515388   .0889883     5.07   0.000     .2771249    .6259527
     married |   -.146192   .1098131    -1.33   0.183    -.3614216    .0690377
        educ |   -.023948    .019578    -1.22   0.221    -.0623202    .0144241
         age |  -.0036431   .0005284    -6.89   0.000    -.0046788   -.0026073
       _cons |  -3.639277   .3077568   -11.83   0.000     -4.24247   -3.036085
-------------+----------------------------------------------------------------
       /ln_p |   .9214587   .0396737    23.23   0.000     .8436997    .9992178
-------------+----------------------------------------------------------------
           p |   2.512953   .0996982                      2.324953    2.716156
         1/p |   .3979381   .0157877                      .3681673    .4301163
------------------------------------------------------------------------------

Interpretation: Given the log-linear function form, we can easily determine the estimated percent change in duration before criminal recidivism.

    display (exp(.0780722)-1)*100

8.1200719

Being a part of the work program increase the duration of time before recidivism, but it is not statistically significant.

    display (exp(-.287139)-1)*100

-24.959259

Being a felon reduces the duration of time before recidivism, where a felon has as 24% decrease in duration of time before recidivism.