Econ 645: Week 8

Wooldridge Chapter 17.1-17.4 and Mitchell: Chapter 9

Samuel Rowe - Adapted from Wooldridge and Mitchell

. clear

. set more off

Wooldridge

Set Working Directory

. cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
/Users/Sam/Desktop/Econ 645/Data/Wooldridge

Married Women’s Labor Force Participation

. use mroz.dta, clear

We’ll use the data from Mroz (1987) to look at the probability of a married woman being in the labor force. Labor force participation is a binary response. \[ y=[0,1] \] We will estimate the coefficients of the linear probability model (LPM), the logit estimator, and the probit estimator. Then, we’ll compare the marginal effects of all three estimators.

Summarize in the labor force

. tabulate inlf

       inlf │      Freq.     Percent        Cum.
────────────┼───────────────────────────────────
          0 │        325       43.16       43.16
          1 │        428       56.84      100.00
────────────┼───────────────────────────────────
      Total │        753      100.00

325 Women are not in the labor force and 428 Participating Our explanatory variables are non-wife income, education, experience, experience-squared, age, kids less than 6, kids greater than 6

. est clear

Logit

. eststo Logit: logit inlf nwifeinc educ exper expersq kidslt6 kidsge6

Iteration 0:   log likelihood =  -514.8732  
Iteration 1:   log likelihood = -422.78042  
Iteration 2:   log likelihood = -421.73851  
Iteration 3:   log likelihood = -421.73502  
Iteration 4:   log likelihood = -421.73502  

Logistic regression                             Number of obs     =        753
                                                LR chi2(6)        =     186.28
                                                Prob > chi2       =     0.0000
Log likelihood = -421.73502                     Pseudo R2         =     0.1809

─────────────┬────────────────────────────────────────────────────────────────
        inlf │      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -.0301171   .0082431    -3.65   0.000    -.0462734   -.0139609
        educ │   .2520038   .0425492     5.92   0.000      .168609    .3353987
       exper │   .2057387   .0310518     6.63   0.000     .1448784     .266599
     expersq │   -.003913   .0009994    -3.92   0.000    -.0058718   -.0019541
     kidslt6 │  -.9175126   .1742458    -5.27   0.000    -1.259028   -.5759971
     kidsge6 │   .2226164   .0683456     3.26   0.001     .0886616    .3565713
       _cons │  -3.739707    .543217    -6.88   0.000    -4.804392   -2.675021
─────────────┴────────────────────────────────────────────────────────────────

Probit

. eststo Probit: probit inlf nwifeinc educ exper expersq kidslt6 kidsge6

Iteration 0:   log likelihood =  -514.8732  
Iteration 1:   log likelihood = -422.36847  
Iteration 2:   log likelihood = -421.80202  
Iteration 3:   log likelihood = -421.80161  
Iteration 4:   log likelihood = -421.80161  

Probit regression                               Number of obs     =        753
                                                LR chi2(6)        =     186.14
                                                Prob > chi2       =     0.0000
Log likelihood = -421.80161                     Pseudo R2         =     0.1808

─────────────┬────────────────────────────────────────────────────────────────
        inlf │      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │   -.017188     .00474    -3.63   0.000    -.0264782   -.0078978
        educ │   .1501412     .02471     6.08   0.000     .1017105    .1985719
       exper │   .1240105   .0183233     6.77   0.000     .0880975    .1599236
     expersq │  -.0023694   .0005913    -4.01   0.000    -.0035284   -.0012103
     kidslt6 │  -.5543317   .1038244    -5.34   0.000    -.7578238   -.3508395
     kidsge6 │   .1307901   .0399186     3.28   0.001     .0525511    .2090292
       _cons │  -2.244553   .3146254    -7.13   0.000    -2.861207   -1.627899
─────────────┴────────────────────────────────────────────────────────────────

. esttab Logit Probit, mtitle

────────────────────────────────────────────
                      (1)             (2)   
                    Logit          Probit   
────────────────────────────────────────────
inlf                                        
nwifeinc          -0.0301***      -0.0172***
                  (-3.65)         (-3.63)   

educ                0.252***        0.150***
                   (5.92)          (6.08)   

exper               0.206***        0.124***
                   (6.63)          (6.77)   

expersq          -0.00391***     -0.00237***
                  (-3.92)         (-4.01)   

kidslt6            -0.918***       -0.554***
                  (-5.27)         (-5.34)   

kidsge6             0.223**         0.131** 
                   (3.26)          (3.28)   

_cons              -3.740***       -2.245***
                  (-6.88)         (-7.13)   
────────────────────────────────────────────
N                     753             753   
────────────────────────────────────────────
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Average Marginal Effects (AME)

. est clear

LPM

. quietly reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6

. eststo LPM: margins, dydx(*) post 

Average marginal effects                        Number of obs     =        753
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : nwifeinc educ exper expersq age kidslt6 kidsge6

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -.0034052   .0014485    -2.35   0.019    -.0062488   -.0005616
        educ │   .0379953    .007376     5.15   0.000      .023515    .0524756
       exper │   .0394924   .0056727     6.96   0.000     .0283561    .0506287
     expersq │  -.0005963   .0001848    -3.23   0.001    -.0009591   -.0002335
         age │  -.0160908   .0024847    -6.48   0.000    -.0209686    -.011213
     kidslt6 │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
     kidsge6 │   .0130122    .013196     0.99   0.324    -.0128935    .0389179
─────────────┴────────────────────────────────────────────────────────────────

Logit

. quietly logit inlf nwifeinc educ exper expersq age kidslt6 kidsge6

. eststo Logit: margins, dydx(*) post

Average marginal effects                        Number of obs     =        753
Model VCE    : OIM

Expression   : Pr(inlf), predict()
dy/dx w.r.t. : nwifeinc educ exper expersq age kidslt6 kidsge6

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -.0038118   .0014824    -2.57   0.010    -.0067172   -.0009064
        educ │   .0394965   .0072947     5.41   0.000     .0251992    .0537939
       exper │   .0367641     .00515     7.14   0.000     .0266702     .046858
     expersq │  -.0005633   .0001774    -3.18   0.001    -.0009109   -.0002156
         age │  -.0157194   .0023808    -6.60   0.000    -.0203856   -.0110532
     kidslt6 │  -.2577537   .0319416    -8.07   0.000    -.3203581   -.1951492
     kidsge6 │   .0107348    .013333     0.81   0.421    -.0153974    .0368671
─────────────┴────────────────────────────────────────────────────────────────

Probit

. quietly probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6

. eststo Probit: margins, dydx(*) post

Average marginal effects                        Number of obs     =        753
Model VCE    : OIM

Expression   : Pr(inlf), predict()
dy/dx w.r.t. : nwifeinc educ exper expersq age kidslt6 kidsge6

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -.0036162   .0014414    -2.51   0.012    -.0064413   -.0007911
        educ │   .0393703   .0072216     5.45   0.000     .0252161    .0535244
       exper │   .0370974   .0051522     7.20   0.000     .0269993    .0471956
     expersq │  -.0005675   .0001771    -3.20   0.001    -.0009146   -.0002204
         age │  -.0158957   .0023587    -6.74   0.000    -.0205186   -.0112728
     kidslt6 │  -.2611542   .0318597    -8.20   0.000    -.3235982   -.1987103
     kidsge6 │   .0108287   .0130584     0.83   0.407    -.0147654    .0364227
─────────────┴────────────────────────────────────────────────────────────────

. esttab LPM Logit Probit, mtitle 

────────────────────────────────────────────────────────────
                      (1)             (2)             (3)   
                      LPM           Logit          Probit   
────────────────────────────────────────────────────────────
nwifeinc         -0.00341*       -0.00381*       -0.00362*  
                  (-2.35)         (-2.57)         (-2.51)   

educ               0.0380***       0.0395***       0.0394***
                   (5.15)          (5.41)          (5.45)   

exper              0.0395***       0.0368***       0.0371***
                   (6.96)          (7.14)          (7.20)   

expersq         -0.000596**     -0.000563**     -0.000568** 
                  (-3.23)         (-3.18)         (-3.20)   

age               -0.0161***      -0.0157***      -0.0159***
                  (-6.48)         (-6.60)         (-6.74)   

kidslt6            -0.262***       -0.258***       -0.261***
                  (-7.81)         (-8.07)         (-8.20)   

kidsge6            0.0130          0.0107          0.0108   
                   (0.99)          (0.81)          (0.83)   
────────────────────────────────────────────────────────────
N                     753             753             753   
────────────────────────────────────────────────────────────
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Marginal Effects at the Average (MEA)

. est clear

LPM

. quietly reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6

. eststo LPM: margins, dydx(*) atmeans post

Conditional marginal effects                    Number of obs     =        753
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : nwifeinc educ exper expersq age kidslt6 kidsge6
at           : nwifeinc        =    20.12896 (mean)
               educ            =    12.28685 (mean)
               exper           =    10.63081 (mean)
               expersq         =    178.0385 (mean)
               age             =    42.53785 (mean)
               kidslt6         =    .2377158 (mean)
               kidsge6         =    1.353254 (mean)

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -.0034052   .0014485    -2.35   0.019    -.0062488   -.0005616
        educ │   .0379953    .007376     5.15   0.000      .023515    .0524756
       exper │   .0394924   .0056727     6.96   0.000     .0283561    .0506287
     expersq │  -.0005963   .0001848    -3.23   0.001    -.0009591   -.0002335
         age │  -.0160908   .0024847    -6.48   0.000    -.0209686    -.011213
     kidslt6 │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
     kidsge6 │   .0130122    .013196     0.99   0.324    -.0128935    .0389179
─────────────┴────────────────────────────────────────────────────────────────

Logit

. quietly logit inlf nwifeinc educ exper expersq age kidslt6 kidsge6

. eststo Logit: margins, dydx(*) atmeans post

Conditional marginal effects                    Number of obs     =        753
Model VCE    : OIM

Expression   : Pr(inlf), predict()
dy/dx w.r.t. : nwifeinc educ exper expersq age kidslt6 kidsge6
at           : nwifeinc        =    20.12896 (mean)
               educ            =    12.28685 (mean)
               exper           =    10.63081 (mean)
               expersq         =    178.0385 (mean)
               age             =    42.53785 (mean)
               kidslt6         =    .2377158 (mean)
               kidsge6         =    1.353254 (mean)

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -.0051901   .0020482    -2.53   0.011    -.0092045   -.0011756
        educ │   .0537773   .0105608     5.09   0.000     .0330785    .0744761
       exper │   .0500569   .0078247     6.40   0.000     .0347209     .065393
     expersq │  -.0007669   .0002477    -3.10   0.002    -.0012524   -.0002815
         age │   -.021403   .0035398    -6.05   0.000    -.0283408   -.0144652
     kidslt6 │  -.3509498   .0496395    -7.07   0.000    -.4482414   -.2536583
     kidsge6 │   .0146162   .0181884     0.80   0.422    -.0210324    .0502649
─────────────┴────────────────────────────────────────────────────────────────

Probit

. quietly probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6

. eststo Probit: margins, dydx(*) atmeans post

Conditional marginal effects                    Number of obs     =        753
Model VCE    : OIM

Expression   : Pr(inlf), predict()
dy/dx w.r.t. : nwifeinc educ exper expersq age kidslt6 kidsge6
at           : nwifeinc        =    20.12896 (mean)
               educ            =    12.28685 (mean)
               exper           =    10.63081 (mean)
               expersq         =    178.0385 (mean)
               age             =    42.53785 (mean)
               kidslt6         =    .2377158 (mean)
               kidsge6         =    1.353254 (mean)

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -.0046962   .0018903    -2.48   0.013    -.0084012   -.0009913
        educ │   .0511287   .0098592     5.19   0.000     .0318051    .0704523
       exper │   .0481771   .0073278     6.57   0.000     .0338149    .0625392
     expersq │  -.0007371   .0002347    -3.14   0.002     -.001197   -.0002771
         age │  -.0206432   .0033079    -6.24   0.000    -.0271265   -.0141598
     kidslt6 │  -.3391514   .0463581    -7.32   0.000    -.4300117   -.2482911
     kidsge6 │   .0140628   .0169852     0.83   0.408    -.0192275    .0473531
─────────────┴────────────────────────────────────────────────────────────────

Compare

. esttab LPM Logit Probit, mtitle 

────────────────────────────────────────────────────────────
                      (1)             (2)             (3)   
                      LPM           Logit          Probit   
────────────────────────────────────────────────────────────
nwifeinc         -0.00341*       -0.00519*       -0.00470*  
                  (-2.35)         (-2.53)         (-2.48)   

educ               0.0380***       0.0538***       0.0511***
                   (5.15)          (5.09)          (5.19)   

exper              0.0395***       0.0501***       0.0482***
                   (6.96)          (6.40)          (6.57)   

expersq         -0.000596**     -0.000767**     -0.000737** 
                  (-3.23)         (-3.10)         (-3.14)   

age               -0.0161***      -0.0214***      -0.0206***
                  (-6.48)         (-6.05)         (-6.24)   

kidslt6            -0.262***       -0.351***       -0.339***
                  (-7.81)         (-7.07)         (-7.32)   

kidsge6            0.0130          0.0146          0.0141   
                   (0.99)          (0.80)          (0.83)   
────────────────────────────────────────────────────────────
N                     753             753             753   
────────────────────────────────────────────────────────────
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

The analysis shows that the marginal effects are fairly close across the linear probability model, Logit model, and Probit model. One additional year of education increases the probability of being in the labor force by a range of 0.038 to 0.0395 percentage points. Interestingly, one additional child less than six is associated with a drop in the probability of being in the labor force by a range of 0.258 to 0.262.

Please not that around the means, our linear probability model, Logit, and Probit should be fairly similar. However, the marginal effects for the linear probability model are constant and will not vary across different values of x.

Odds Ratios

We can use the option, or to get odds ratios after running a logit.

\[ OR = \frac{(Odds Success)}{(Odds Failure)} = \frac{p(1)/(1-p(1))}{p(0)/(1-p(0))} \]

. logit inlf nwifeinc educ exper expersq age kidslt6 kidsge6, or

Iteration 0:   log likelihood =  -514.8732  
Iteration 1:   log likelihood = -402.38502  
Iteration 2:   log likelihood = -401.76569  
Iteration 3:   log likelihood = -401.76515  
Iteration 4:   log likelihood = -401.76515  

Logistic regression                             Number of obs     =        753
                                                LR chi2(7)        =     226.22
                                                Prob > chi2       =     0.0000
Log likelihood = -401.76515                     Pseudo R2         =     0.2197

─────────────┬────────────────────────────────────────────────────────────────
        inlf │ Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │    .978881   .0082436    -2.53   0.011     .9628565    .9951723
        educ │   1.247536   .0541925     5.09   0.000     1.145717    1.358404
       exper │   1.228593   .0393849     6.42   0.000     1.153775    1.308263
     expersq │   .9968509   .0010129    -3.10   0.002     .9948676    .9988381
         age │   .9157386   .0133451    -6.04   0.000     .8899527    .9422715
     kidslt6 │   .2361344   .0480734    -7.09   0.000      .158441    .3519257
     kidsge6 │   1.061956   .0794234     0.80   0.422     .9171603     1.22961
       _cons │   1.530283   1.316609     0.49   0.621     .2834155    8.262655
─────────────┴────────────────────────────────────────────────────────────────

One additional year of education is associated with a 1.25 times increase in the odds of being in the labor force (or an increase of 25%) holding all other variables constant. One additional child less than six decreases the odds of being in the labor force by a factor of 0.24 holding all other variables constant (or a decrease of 76%).

Marginal Effects of Education at different points along the curve

LPM

. est clear

. quietly reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6

. eststo lpm: margins, at(educ=(0(2)20)) post

Predictive margins                              Number of obs     =        753
Model VCE    : OLS

Expression   : Linear prediction, predict()

1._at        : educ            =           0

2._at        : educ            =           2

3._at        : educ            =           4

4._at        : educ            =           6

5._at        : educ            =           8

6._at        : educ            =          10

7._at        : educ            =          12

8._at        : educ            =          14

9._at        : educ            =          16

10._at       : educ            =          18

11._at       : educ            =          20

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
         _at │
          1  │   .1015504    .091955     1.10   0.270    -.0789714    .2820723
          2  │    .177541   .0774562     2.29   0.022     .0254827    .3295993
          3  │   .2535316   .0630748     4.02   0.000     .1297062    .3773571
          4  │   .3295222   .0489147     6.74   0.000     .2334952    .4255492
          5  │   .4055128   .0352435    11.51   0.000     .3363244    .4747013
          6  │   .4815034   .0229524    20.98   0.000     .4364444    .5265625
          7  │    .557494   .0157087    35.49   0.000     .5266554    .5883327
          8  │   .6334846    .020049    31.60   0.000     .5941254    .6728439
          9  │   .7094753   .0315024    22.52   0.000     .6476311    .7713194
         10  │   .7854659   .0449232    17.48   0.000     .6972748    .8736569
         11  │   .8614565   .0589832    14.61   0.000     .7456633    .9772496
─────────────┴────────────────────────────────────────────────────────────────

. marginsplot, yline(0)

  Variables that uniquely identify margins: educ

Logit

. quietly logit inlf nwifeinc educ exper expersq kidslt6 kidsge6

. eststo logit1: margins, at(educ=(0(2)20)) post

Predictive margins                              Number of obs     =        753
Model VCE    : OIM

Expression   : Pr(inlf), predict()

1._at        : educ            =           0

2._at        : educ            =           2

3._at        : educ            =           4

4._at        : educ            =           6

5._at        : educ            =           8

6._at        : educ            =          10

7._at        : educ            =          12

8._at        : educ            =          14

9._at        : educ            =          16

10._at       : educ            =          18

11._at       : educ            =          20

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
         _at │
          1  │   .0862997   .0378791     2.28   0.023     .0120581    .1605413
          2  │   .1307839     .04457     2.93   0.003     .0434283    .2181396
          3  │   .1911164   .0475215     4.02   0.000      .097976    .2842567
          4  │   .2675469   .0448132     5.97   0.000     .1797146    .3553791
          5  │   .3575117   .0362904     9.85   0.000     .2863838    .4286396
          6  │    .455911    .024513    18.60   0.000     .4078663    .5039556
          7  │   .5562468   .0165243    33.66   0.000     .5238597    .5886338
          8  │   .6519779   .0204676    31.85   0.000     .6118622    .6920936
          9  │   .7376484   .0281232    26.23   0.000     .6825279    .7927688
         10  │   .8096698   .0325919    24.84   0.000     .7457909    .8735486
         11  │   .8666826   .0327653    26.45   0.000     .8024638    .9309015
─────────────┴────────────────────────────────────────────────────────────────

. marginsplot, yline(0)

  Variables that uniquely identify margins: educ

Probit

. quietly probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6

. eststo probit1: margins, at(educ=(0(2)20)) post

Predictive margins                              Number of obs     =        753
Model VCE    : OIM

Expression   : Pr(inlf), predict()

1._at        : educ            =           0

2._at        : educ            =           2

3._at        : educ            =           4

4._at        : educ            =           6

5._at        : educ            =           8

6._at        : educ            =          10

7._at        : educ            =          12

8._at        : educ            =          14

9._at        : educ            =          16

10._at       : educ            =          18

11._at       : educ            =          20

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
         _at │
          1  │   .1315922    .055059     2.39   0.017     .0236785    .2395059
          2  │   .1833441   .0570249     3.22   0.001     .0715773    .2951109
          3  │   .2455272   .0543798     4.52   0.000     .1389447    .3521097
          4  │   .3168216   .0470756     6.73   0.000     .2245552     .409088
          5  │   .3949415   .0360827    10.95   0.000     .3242206    .4656623
          6  │   .4768675   .0237722    20.06   0.000     .4302749    .5234602
          7  │   .5592018   .0159547    35.05   0.000     .5279311    .5904724
          8  │   .6385748   .0200316    31.88   0.000     .5993136     .677836
          9  │   .7120285   .0288142    24.71   0.000     .6555538    .7685032
         10  │   .7773109   .0357596    21.74   0.000     .7072234    .8473984
         11  │   .8330432   .0391944    21.25   0.000     .7562235    .9098629
─────────────┴────────────────────────────────────────────────────────────────

. marginsplot, yline(0)

  Variables that uniquely identify margins: educ

The predicted probability that a married women is in the labor force rises from 47.7% for 12 years of education to 71.2% for 16 years of education.

Coefficient Plot

. coefplot lpm logit1, at recast(line) ciopts(recast(rline) lpattern(dash))

. coefplot lpm probit1, at recast(line) ciopts(recast(rline) lpattern(dash))

. coefplot logit1 probit1, at recast(line) ciopts(recast(rline) lpattern(dash))

Marginal Effects of Education at different points along the curve

LPM

. quietly reg inlf nwifeinc educ exper expersq age kidslt6 kidsge6

. margins, dydx(kidslt6) at(educ=(0(2)20))

Average marginal effects                        Number of obs     =        753
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : kidslt6

1._at        : educ            =           0

2._at        : educ            =           2

3._at        : educ            =           4

4._at        : educ            =           6

5._at        : educ            =           8

6._at        : educ            =          10

7._at        : educ            =          12

8._at        : educ            =          14

9._at        : educ            =          16

10._at       : educ            =          18

11._at       : educ            =          20

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
kidslt6      │
         _at │
          1  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
          2  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
          3  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
          4  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
          5  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
          6  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
          7  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
          8  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
          9  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
         10  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
         11  │  -.2618105   .0335058    -7.81   0.000    -.3275875   -.1960335
─────────────┴────────────────────────────────────────────────────────────────

. marginsplot, yline(0)

  Variables that uniquely identify margins: educ

Logit

. quietly logit inlf nwifeinc educ exper expersq kidslt6 kidsge6

. margins, dydx(kidslt6) at(educ=(0(2)20))

Average marginal effects                        Number of obs     =        753
Model VCE    : OIM

Expression   : Pr(inlf), predict()
dy/dx w.r.t. : kidslt6

1._at        : educ            =           0

2._at        : educ            =           2

3._at        : educ            =           4

4._at        : educ            =           6

5._at        : educ            =           8

6._at        : educ            =          10

7._at        : educ            =          12

8._at        : educ            =          14

9._at        : educ            =          16

10._at       : educ            =          18

11._at       : educ            =          20

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
kidslt6      │
         _at │
          1  │  -.0677567   .0262236    -2.58   0.010    -.1191539   -.0163594
          2  │  -.0949468   .0281434    -3.37   0.001    -.1501069   -.0397867
          3  │  -.1248173   .0285395    -4.37   0.000    -.1807537   -.0688808
          4  │  -.1526688   .0293293    -5.21   0.000    -.2101531   -.0951844
          5  │  -.1732958   .0313347    -5.53   0.000    -.2347106    -.111881
          6  │  -.1829484   .0327734    -5.58   0.000    -.2471831   -.1187137
          7  │  -.1803418   .0319215    -5.65   0.000    -.2429069   -.1177768
          8  │  -.1665232   .0289231    -5.76   0.000    -.2232115   -.1098349
          9  │  -.1442912   .0254362    -5.67   0.000    -.1941452   -.0944372
         10  │  -.1175221   .0231237    -5.08   0.000    -.1628437   -.0722005
         11  │  -.0902871   .0218973    -4.12   0.000    -.1332051   -.0473691
─────────────┴────────────────────────────────────────────────────────────────

. marginsplot, yline(0)

  Variables that uniquely identify margins: educ

. graph export "/Users/Sam/Desktop/Econ 645/Stata/week8_logitinlf.png", replace
(file /Users/Sam/Desktop/Econ 645/Stata/week8_logitinlf.png written in PNG format)

Probit

. quietly probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6

. margins, dydx(kidslt6) at(educ=(0(2)20))

Average marginal effects                        Number of obs     =        753
Model VCE    : OIM

Expression   : Pr(inlf), predict()
dy/dx w.r.t. : kidslt6

1._at        : educ            =           0

2._at        : educ            =           2

3._at        : educ            =           4

4._at        : educ            =           6

5._at        : educ            =           8

6._at        : educ            =          10

7._at        : educ            =          12

8._at        : educ            =          14

9._at        : educ            =          16

10._at       : educ            =          18

11._at       : educ            =          20

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
kidslt6      │
         _at │
          1  │   -.153748   .0455045    -3.38   0.001    -.2429352   -.0645607
          2  │  -.1893831   .0408614    -4.63   0.000    -.2694699   -.1092962
          3  │  -.2223621   .0360335    -6.17   0.000    -.2929864   -.1517377
          4  │  -.2492771    .033371    -7.47   0.000    -.3146829   -.1838712
          5  │  -.2672253   .0331963    -8.05   0.000    -.3322889   -.2021617
          6  │  -.2743121   .0335951    -8.17   0.000    -.3401572    -.208467
          7  │  -.2699501    .032832    -8.22   0.000    -.3342997   -.2056006
          8  │   -.254902    .030869    -8.26   0.000    -.3154042   -.1943998
          9  │  -.2310833   .0292123    -7.91   0.000    -.2883384   -.1738282
         10  │  -.2011899   .0295232    -6.81   0.000    -.2590542   -.1433256
         11  │  -.1682404   .0317255    -5.30   0.000    -.2304211   -.1060596
─────────────┴────────────────────────────────────────────────────────────────

. marginsplot, yline(0)

  Variables that uniquely identify margins: educ

The average marginal effect for an additional child less than 6 rises from -18.3 percentage points to -14.4 percentage points, but the difference does not appear to be statistically significant.

Multinominal Logit

. use "/Users/Sam/Desktop/Econ 645/Data/CPS/mlogit_example.dta", clear

Our multinominal logit model

Lesson: Interpreting a nominal categorical dependent variable

We will use Current Population Survey Data from September 2023 and 2024 to estimate the following model for labor force participation: \[ lfs_{i}=\beta_{0}+\beta_1 edu_{i} + \beta_2 exper_i + ... + u_i \]. There are three alternatives: Employed, Unemployed, and Not in the Labor Force.

. tab laborforce

Labor Force │
     Status │      Freq.     Percent        Cum.
────────────┼───────────────────────────────────
   Employed │     23,760       57.12       57.12
 Unemployed │        832        2.00       59.12
       NILF │     17,007       40.88      100.00
────────────┼───────────────────────────────────
      Total │     41,599      100.00

. tab laborforce, nolabel

Labor Force │
     Status │      Freq.     Percent        Cum.
────────────┼───────────────────────────────────
          1 │     23,760       57.12       57.12
          2 │        832        2.00       59.12
          3 │     17,007       40.88      100.00
────────────┼───────────────────────────────────
      Total │     41,599      100.00

Our dependent variable has three nominal categories. \[ y=[1,2,3] \] оr \[ y=[Employed, Unemployed, NILF] \]

We use the mlogit command to run a multinomial logit to get log odds for J-1 logits.

. mlogit laborforce i.educat exp exp2 i.race_ethnicity i.female i.metroarea i.union i.marital i.hryear4

Iteration 0:   log likelihood = -31774.039  
Iteration 1:   log likelihood = -22975.759  
Iteration 2:   log likelihood = -22620.368  
Iteration 3:   log likelihood = -22573.244  
Iteration 4:   log likelihood = -22565.789  
Iteration 5:   log likelihood = -22564.343  
Iteration 6:   log likelihood = -22564.014  
Iteration 7:   log likelihood = -22563.941  
Iteration 8:   log likelihood = -22563.925  
Iteration 9:   log likelihood = -22563.922  
Iteration 10:  log likelihood = -22563.922  
Iteration 11:  log likelihood = -22563.922  
Iteration 12:  log likelihood = -22563.922  

Multinomial logistic regression                 Number of obs     =     41,599
                                                LR chi2(38)       =   18420.23
                                                Prob > chi2       =     0.0000
Log likelihood = -22563.922                     Pseudo R2         =     0.2899

──────────────────────────┬────────────────────────────────────────────────────────────────
         laborforcestatus │      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
Employed                  │  (base outcome)
──────────────────────────┼────────────────────────────────────────────────────────────────
Unemployed                │
                   educat │
                     HSD  │  -.2279524   .1119685    -2.04   0.042    -.4474066   -.0084982
            Some College  │  -.4312492   .1303621    -3.31   0.001    -.6867542   -.1757442
           AA/Vocational  │  -.6931682    .165487    -4.19   0.000    -1.017517   -.3688197
                   BS/BA  │  -.6706784   .1331892    -5.04   0.000    -.9317245   -.4096323
Graduate or Professional  │  -1.063383   .1745188    -6.09   0.000    -1.405433   -.7213323
                          │
                      exp │  -.0291051   .0088759    -3.28   0.001    -.0465016   -.0117087
                     exp2 │   .0002524   .0001575     1.60   0.109    -.0000562    .0005611
                          │
           race_ethnicity │
  Asian/Pacific Islander  │   -.813795   .3012644    -2.70   0.007    -1.404262   -.2233275
                   Black  │  -.3951124   .2756281    -1.43   0.152    -.9353337    .1451088
         Hispanic/Latino  │  -.6504449   .2691511    -2.42   0.016    -1.177971   -.1229184
                   White  │  -.7855686   .2616547    -3.00   0.003    -1.298402   -.2727347
             Multiracial  │  -.4798449   .3366208    -1.43   0.154     -1.13961    .1799197
                          │
                   female │
                  Female  │  -.0004137   .0721979    -0.01   0.995     -.141919    .1410915
                          │
                metroarea │
           Nonmetro Area  │  -.1010417    .097797    -1.03   0.302    -.2927204    .0906369
          Not Identified  │   .0635435    .346152     0.18   0.854     -.614902     .741989
                          │
                    union │
                   Union  │  -19.66515   2339.277    -0.01   0.993    -4604.564    4565.234
                          │
                  marital │
    Divorced/Sep/Widowed  │   .6132556   .1175463     5.22   0.000     .3828691     .843642
           Never Married  │   .7526873   .0988405     7.62   0.000     .5589634    .9464112
                          │
                  hryear4 │
                    2024  │   .0228376    .071126     0.32   0.748    -.1165667    .1622419
                          │
                    _cons │  -2.023219   .3012462    -6.72   0.000    -2.613651   -1.432787
──────────────────────────┼────────────────────────────────────────────────────────────────
NILF                      │
                   educat │
                     HSD  │  -.7898216   .0425444   -18.56   0.000    -.8732071   -.7064361
            Some College  │  -.8941042   .0479568   -18.64   0.000    -.9880978   -.8001105
           AA/Vocational  │  -1.143176   .0559087   -20.45   0.000    -1.252755   -1.033597
                   BS/BA  │  -1.493236   .0488335   -30.58   0.000    -1.588948   -1.397524
Graduate or Professional  │  -1.746904   .0561914   -31.09   0.000    -1.857038   -1.636771
                          │
                      exp │  -.1490983    .003087   -48.30   0.000    -.1551487    -.143048
                     exp2 │   .0031412   .0000469    66.92   0.000     .0030492    .0032332
                          │
           race_ethnicity │
  Asian/Pacific Islander  │  -.1995846   .1336975    -1.49   0.135    -.4616269    .0624578
                   Black  │  -.1053158   .1293382    -0.81   0.415     -.358814    .1481824
         Hispanic/Latino  │  -.4785722   .1270357    -3.77   0.000    -.7275575   -.2295869
                   White  │  -.3378929   .1237786    -2.73   0.006    -.5804944   -.0952913
             Multiracial  │  -.1802636   .1537778    -1.17   0.241    -.4816625    .1211354
                          │
                   female │
                  Female  │   .5781835   .0258942    22.33   0.000     .5274318    .6289351
                          │
                metroarea │
           Nonmetro Area  │  -.0069684   .0328081    -0.21   0.832    -.0712712    .0573344
          Not Identified  │  -.0021328   .1282129    -0.02   0.987    -.2534254    .2491598
                          │
                    union │
                   Union  │  -20.25621   566.9973    -0.04   0.972     -1131.55    1091.038
                          │
                  marital │
    Divorced/Sep/Widowed  │  -.0531382   .0373157    -1.42   0.154    -.1262757    .0199993
           Never Married  │   .1464745   .0379587     3.86   0.000     .0720769    .2208722
                          │
                  hryear4 │
                    2024  │  -.0398405   .0253899    -1.57   0.117    -.0896037    .0099228
                          │
                    _cons │   1.072448   .1365064     7.86   0.000     .8049003    1.339996
──────────────────────────┴────────────────────────────────────────────────────────────────
Note: 2044 observations completely determined.  Standard errors questionable.

Next, we need to test our Independence of Irrelevant Alternatives (IIA) assumption with a Hausman Test.

Estimate an Unrestricted model - We’ll remove unemployed

. quietly mlogit laborforce i.educat exp exp2 i.race_ethnicity i.female i.metroarea i.union i.marital i.hryear4

. estimates store unrestricted

We compare the log odds for unemployed and not in the labor force to being employed. Given that these are log odds, we’ll need to convert them to Odds Ratios or find the marginal effects.

Estimate a Restricted Model to test Independence of Irrelevant Alternatives assumption

. quietly mlogit laborforce i.educat exp exp2 i.race_ethnicity i.female i.metroarea i.union i.marital if laborforce
>  !=2

. estimate store restricted

Use hausman command

. hausman restricted unrestricted, alleqs constant

Note: the rank of the differenced variance matrix (2) does not equal the number of coefficients being tested (19);
        be sure this is what you expect, or there may be problems computing the test.  Examine the output of your
        estimators for anything unexpected and possibly consider scaling your variables so that the coefficients
        are on a similar scale.

                 ──── Coefficients ────
             │      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             │   restricted  unrestricted    Difference          S.E.
─────────────┼────────────────────────────────────────────────────────────────
      educat │
          2  │   -.7860864    -.7898216        .0037352        .0035785
          3  │     -.89149    -.8941042        .0026142        .0032032
          4  │   -1.140012    -1.143176         .003164         .003669
          5  │   -1.486705    -1.493236        .0065306        .0036976
          6  │   -1.737118    -1.746904        .0097867        .0035815
         exp │   -.1486206    -.1490983        .0004777        .0002218
        exp2 │    .0031335     .0031412       -7.78e-06        2.91e-06
race_ethni~y │
          2  │   -.1712505    -.1995846        .0283341        .0120289
          3  │   -.0717454    -.1053158        .0335704        .0121412
          4  │   -.4427331    -.4785722        .0358391        .0115676
          5  │   -.3060009    -.3378929         .031892        .0117006
          6  │   -.1491388    -.1802636        .0311247        .0136923
    1.female │    .5793374     .5781835        .0011539        .0018992
   metroarea │
          2  │   -.0070074    -.0069684        -.000039         .002025
          3  │   -.0064057    -.0021328       -.0042729        .0112617
     1.union │   -20.20846    -20.25621        .0477553               .
     marital │
          2  │   -.0524486    -.0531382        .0006896        .0022895
          3  │    .1495841     .1464745        .0031096        .0023193
       _cons │     1.00979     1.072448       -.0626578               .
─────────────┴────────────────────────────────────────────────────────────────
                          b = consistent under Ho and Ha; obtained from mlogit
           B = inconsistent under Ha, efficient under Ho; obtained from mlogit

    Test:  Ho:  difference in coefficients not systematic

                  chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =        8.92
                Prob>chi2 =      0.0116
                (V_b-V_B is not positive definite)

Our chi-squared is 8.92, which means with reject the IIA assumption. We should consider a binary response here.

Predicted Probabilities

Next, we use Stata to estimate predicted probabilities

. quietly mlogit laborforce i.educat exp exp2 i.race_ethnicity i.female i.metroarea i.union i.marital, base(1)

. margins, atmeans predict(outcome(1))

Adjusted predictions                            Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==Employed), predict(outcome(1))
at           : 1.educat        =    .1261088 (mean)
               2.educat        =     .286954 (mean)
               3.educat        =    .1488497 (mean)
               4.educat        =    .0969254 (mean)
               5.educat        =    .2079617 (mean)
               6.educat        =    .1332003 (mean)
               exp             =    32.89247 (mean)
               exp2            =    1467.595 (mean)
               1.race_eth~y    =    .0098079 (mean)
               2.race_eth~y    =    .0622611 (mean)
               3.race_eth~y    =    .0931032 (mean)
               4.race_eth~y    =    .1487776 (mean)
               5.race_eth~y    =    .6685738 (mean)
               6.race_eth~y    =    .0174764 (mean)
               0.female        =    .4804923 (mean)
               1.female        =    .5195077 (mean)
               1.metroarea     =    .8000673 (mean)
               2.metroarea     =    .1904132 (mean)
               3.metroarea     =    .0095195 (mean)
               0.union         =    .9508642 (mean)
               1.union         =    .0491358 (mean)
               1.marital       =    .5163586 (mean)
               2.marital       =    .1819515 (mean)
               3.marital       =    .3016899 (mean)

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
       _cons │   .7684672   4.830264     0.16   0.874    -8.698677    10.23561
─────────────┴────────────────────────────────────────────────────────────────

. margins, atmeans predict(outcome(2))

Adjusted predictions                            Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==Unemployed), predict(outcome(2))
at           : 1.educat        =    .1261088 (mean)
               2.educat        =     .286954 (mean)
               3.educat        =    .1488497 (mean)
               4.educat        =    .0969254 (mean)
               5.educat        =    .2079617 (mean)
               6.educat        =    .1332003 (mean)
               exp             =    32.89247 (mean)
               exp2            =    1467.595 (mean)
               1.race_eth~y    =    .0098079 (mean)
               2.race_eth~y    =    .0622611 (mean)
               3.race_eth~y    =    .0931032 (mean)
               4.race_eth~y    =    .1487776 (mean)
               5.race_eth~y    =    .6685738 (mean)
               6.race_eth~y    =    .0174764 (mean)
               0.female        =    .4804923 (mean)
               1.female        =    .5195077 (mean)
               1.metroarea     =    .8000673 (mean)
               2.metroarea     =    .1904132 (mean)
               3.metroarea     =    .0095195 (mean)
               0.union         =    .9508642 (mean)
               1.union         =    .0491358 (mean)
               1.marital       =    .5163586 (mean)
               2.marital       =    .1819515 (mean)
               3.marital       =    .3016899 (mean)

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
       _cons │   .0090578    1.03352     0.01   0.993    -2.016605     2.03472
─────────────┴────────────────────────────────────────────────────────────────

. margins, atmeans predict(outcome(3))

Adjusted predictions                            Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==NILF), predict(outcome(3))
at           : 1.educat        =    .1261088 (mean)
               2.educat        =     .286954 (mean)
               3.educat        =    .1488497 (mean)
               4.educat        =    .0969254 (mean)
               5.educat        =    .2079617 (mean)
               6.educat        =    .1332003 (mean)
               exp             =    32.89247 (mean)
               exp2            =    1467.595 (mean)
               1.race_eth~y    =    .0098079 (mean)
               2.race_eth~y    =    .0622611 (mean)
               3.race_eth~y    =    .0931032 (mean)
               4.race_eth~y    =    .1487776 (mean)
               5.race_eth~y    =    .6685738 (mean)
               6.race_eth~y    =    .0174764 (mean)
               0.female        =    .4804923 (mean)
               1.female        =    .5195077 (mean)
               1.metroarea     =    .8000673 (mean)
               2.metroarea     =    .1904132 (mean)
               3.metroarea     =    .0095195 (mean)
               0.union         =    .9508642 (mean)
               1.union         =    .0491358 (mean)
               1.marital       =    .5163586 (mean)
               2.marital       =    .1819515 (mean)
               3.marital       =    .3016899 (mean)

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
       _cons │   .2224751   4.825216     0.05   0.963    -9.234774    9.679725
─────────────┴────────────────────────────────────────────────────────────────

Average Marginal Effects

. margins, dydx(*) predict(outcome(1))

Average marginal effects                        Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==Employed), predict(outcome(1))
dy/dx w.r.t. : 2.educat 3.educat 4.educat 5.educat 6.educat exp exp2 2.race_ethnicity 3.race_ethnicity
               4.race_ethnicity 5.race_ethnicity 6.race_ethnicity 1.female 2.metroarea 3.metroarea 1.union
               2.marital 3.marital

──────────────────────────┬────────────────────────────────────────────────────────────────
                          │            Delta-method
                          │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
                   educat │
                     HSD  │   .1309012   .0072675    18.01   0.000     .1166572    .1451452
            Some College  │   .1499942   .0080928    18.53   0.000     .1341326    .1658558
           AA/Vocational  │   .1911979   .0091537    20.89   0.000     .1732569    .2091389
                   BS/BA  │   .2407492   .0078942    30.50   0.000     .2252768    .2562215
Graduate or Professional  │   .2789487    .008527    32.71   0.000     .2622362    .2956613
                          │
                      exp │   .0219808   .0004288    51.26   0.000     .0211403    .0228213
                     exp2 │  -.0004585   5.91e-06   -77.62   0.000      -.00047   -.0004469
                          │
           race_ethnicity │
  Asian/Pacific Islander  │   .0417403   .0212777     1.96   0.050     .0000368    .0834438
                   Black  │   .0224391   .0206352     1.09   0.277    -.0180052    .0628834
         Hispanic/Latino  │   .0802357   .0202145     3.97   0.000     .0406159    .1198555
                   White  │   .0618266   .0197636     3.13   0.002     .0230907    .1005625
             Multiracial  │   .0348149    .024377     1.43   0.153    -.0129632     .082593
                          │
                   female │
                  Female  │  -.0839096   .0039186   -21.41   0.000    -.0915899   -.0762292
                          │
                metroarea │
           Nonmetro Area  │   .0022776   .0050425     0.45   0.652    -.0076056    .0121607
          Not Identified  │  -.0007629   .0197304    -0.04   0.969    -.0394337    .0379079
                          │
                    union │
                   Union  │   .4414329   .0020464   215.71   0.000     .4374219    .4454438
                          │
                  marital │
    Divorced/Sep/Widowed  │   .0001626   .0057357     0.03   0.977    -.0110791    .0114043
           Never Married  │   -.030733    .005813    -5.29   0.000    -.0421263   -.0193397
──────────────────────────┴────────────────────────────────────────────────────────────────
Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(*) predict(outcome(2))

Average marginal effects                        Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==Unemployed), predict(outcome(2))
dy/dx w.r.t. : 2.educat 3.educat 4.educat 5.educat 6.educat exp exp2 2.race_ethnicity 3.race_ethnicity
               4.race_ethnicity 5.race_ethnicity 6.race_ethnicity 1.female 2.metroarea 3.metroarea 1.union
               2.marital 3.marital

──────────────────────────┬────────────────────────────────────────────────────────────────
                          │            Delta-method
                          │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
                   educat │
                     HSD  │   .0021475   .0023962     0.90   0.370     -.002549     .006844
            Some College  │  -.0014486   .0026367    -0.55   0.583    -.0066165    .0037194
           AA/Vocational  │  -.0048071   .0030039    -1.60   0.110    -.0106946    .0010805
                   BS/BA  │  -.0028255   .0026516    -1.07   0.287    -.0080226    .0023715
Graduate or Professional  │  -.0081376   .0028311    -2.87   0.004    -.0136864   -.0025888
                          │
                      exp │   .0004047   .0001592     2.54   0.011     .0000928    .0007167
                     exp2 │  -.0000155   2.78e-06    -5.57   0.000    -.0000209     -.00001
                          │
           race_ethnicity │
  Asian/Pacific Islander  │  -.0176565   .0086621    -2.04   0.042     -.034634   -.0006791
                   Black  │  -.0099588   .0085704    -1.16   0.245    -.0267565    .0068389
         Hispanic/Latino  │  -.0128404   .0084135    -1.53   0.127    -.0293307    .0036498
                   White  │  -.0163498   .0082939    -1.97   0.049    -.0326055   -.0000942
             Multiracial  │  -.0112902   .0095517    -1.18   0.237    -.0300113    .0074308
                          │
                   female │
                  Female  │  -.0037756   .0013734    -2.75   0.006    -.0064674   -.0010838
                          │
                metroarea │
           Nonmetro Area  │  -.0018498   .0017658    -1.05   0.295    -.0053107    .0016111
          Not Identified  │   .0012508   .0071177     0.18   0.861    -.0126997    .0152012
                          │
                    union │
                   Union  │  -.0210849   .0007187   -29.34   0.000    -.0224935   -.0196762
                          │
                  marital │
    Divorced/Sep/Widowed  │   .0110753   .0024351     4.55   0.000     .0063027    .0158479
           Never Married  │   .0129138   .0018379     7.03   0.000     .0093115    .0165161
──────────────────────────┴────────────────────────────────────────────────────────────────
Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(*) predict(outcome(3))

Average marginal effects                        Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==NILF), predict(outcome(3))
dy/dx w.r.t. : 2.educat 3.educat 4.educat 5.educat 6.educat exp exp2 2.race_ethnicity 3.race_ethnicity
               4.race_ethnicity 5.race_ethnicity 6.race_ethnicity 1.female 2.metroarea 3.metroarea 1.union
               2.marital 3.marital

──────────────────────────┬────────────────────────────────────────────────────────────────
                          │            Delta-method
                          │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
                   educat │
                     HSD  │  -.1330487   .0071954   -18.49   0.000    -.1471514    -.118946
            Some College  │  -.1485456   .0079934   -18.58   0.000    -.1642124   -.1328789
           AA/Vocational  │  -.1863908   .0090294   -20.64   0.000    -.2040881   -.1686934
                   BS/BA  │  -.2379236   .0077818   -30.57   0.000    -.2531757   -.2226716
Graduate or Professional  │  -.2708111   .0083689   -32.36   0.000    -.2872139   -.2544084
                          │
                      exp │  -.0223855   .0004141   -54.06   0.000    -.0231971   -.0215739
                     exp2 │   .0004739   5.54e-06    85.61   0.000     .0004631    .0004848
                          │
           race_ethnicity │
  Asian/Pacific Islander  │  -.0240837   .0207242    -1.16   0.245    -.0647024    .0165349
                   Black  │  -.0124803   .0200649    -0.62   0.534    -.0518068    .0268463
         Hispanic/Latino  │  -.0673952     .01964    -3.43   0.001    -.1058889   -.0289016
                   White  │  -.0454768   .0191993    -2.37   0.018    -.0831067   -.0078468
             Multiracial  │  -.0235247   .0237295    -0.99   0.322    -.0700337    .0229843
                          │
                   female │
                  Female  │   .0876852   .0038386    22.84   0.000     .0801616    .0952088
                          │
                metroarea │
           Nonmetro Area  │  -.0004278   .0049292    -0.09   0.931    -.0100888    .0092332
          Not Identified  │  -.0004879   .0192391    -0.03   0.980    -.0381959    .0372201
                          │
                    union │
                   Union  │   -.420348   .0020016  -210.01   0.000     -.424271   -.4164251
                          │
                  marital │
    Divorced/Sep/Widowed  │  -.0112379   .0055186    -2.04   0.042    -.0220542   -.0004216
           Never Married  │   .0178192   .0057407     3.10   0.002     .0065676    .0290707
──────────────────────────┴────────────────────────────────────────────────────────────────
Note: dy/dx for factor levels is the discrete change from the base level.

Marginal Effects at the Average

. margins, dydx(*) atmeans predict(outcome(1))

Conditional marginal effects                    Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==Employed), predict(outcome(1))
dy/dx w.r.t. : 2.educat 3.educat 4.educat 5.educat 6.educat exp exp2 2.race_ethnicity 3.race_ethnicity
               4.race_ethnicity 5.race_ethnicity 6.race_ethnicity 1.female 2.metroarea 3.metroarea 1.union
               2.marital 3.marital
at           : 1.educat        =    .1261088 (mean)
               2.educat        =     .286954 (mean)
               3.educat        =    .1488497 (mean)
               4.educat        =    .0969254 (mean)
               5.educat        =    .2079617 (mean)
               6.educat        =    .1332003 (mean)
               exp             =    32.89247 (mean)
               exp2            =    1467.595 (mean)
               1.race_eth~y    =    .0098079 (mean)
               2.race_eth~y    =    .0622611 (mean)
               3.race_eth~y    =    .0931032 (mean)
               4.race_eth~y    =    .1487776 (mean)
               5.race_eth~y    =    .6685738 (mean)
               6.race_eth~y    =    .0174764 (mean)
               0.female        =    .4804923 (mean)
               1.female        =    .5195077 (mean)
               1.metroarea     =    .8000673 (mean)
               2.metroarea     =    .1904132 (mean)
               3.metroarea     =    .0095195 (mean)
               0.union         =    .9508642 (mean)
               1.union         =    .0491358 (mean)
               1.marital       =    .5163586 (mean)
               2.marital       =    .1819515 (mean)
               3.marital       =    .3016899 (mean)

──────────────────────────┬────────────────────────────────────────────────────────────────
                          │            Delta-method
                          │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
                   educat │
                     HSD  │   .1755775   1.437386     0.12   0.903    -2.641647    2.992802
            Some College  │    .196493    1.66501     0.12   0.906    -3.066867    3.459853
           AA/Vocational  │    .240734   2.297249     0.10   0.917     -4.26179    4.743258
                   BS/BA  │   .2905963   3.204011     0.09   0.928     -5.98915    6.570342
Graduate or Professional  │   .3223929   3.782688     0.09   0.932     -7.09154    7.736326
                          │
                      exp │   .0256925   .3928606     0.07   0.948    -.7443001    .7956851
                     exp2 │  -.0005388   .0083342    -0.06   0.948    -.0168735    .0157959
                          │
           race_ethnicity │
  Asian/Pacific Islander  │   .0449909   .7906855     0.06   0.955    -1.504724    1.594706
                   Black  │   .0244574   .4314249     0.06   0.955    -.8211199    .8700347
         Hispanic/Latino  │   .0912758   1.237833     0.07   0.941    -2.334832    2.517383
                   White  │     .06928   .9827052     0.07   0.944    -1.856787    1.995347
             Multiracial  │   .0392659   .5722016     0.07   0.945    -1.082229    1.160761
                          │
                   female │
                  Female  │  -.0981776   1.521079    -0.06   0.949    -3.079437    2.883082
                          │
                metroarea │
           Nonmetro Area  │   .0018837   .0756332     0.02   0.980    -.1463546     .150122
          Not Identified  │  -.0003745   .0562184    -0.01   0.995    -.1105606    .1098116
                          │
                    union │
                   Union  │    .448807   .0032032   140.11   0.000     .4425288    .4550853
                          │
                  marital │
    Divorced/Sep/Widowed  │   .0044561   .5402485     0.01   0.993    -1.054411    1.063324
           Never Married  │  -.0309979   .6401718    -0.05   0.961    -1.285712    1.223716
──────────────────────────┴────────────────────────────────────────────────────────────────
Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(*) atmeans predict(outcome(2))

Conditional marginal effects                    Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==Unemployed), predict(outcome(2))
dy/dx w.r.t. : 2.educat 3.educat 4.educat 5.educat 6.educat exp exp2 2.race_ethnicity 3.race_ethnicity
               4.race_ethnicity 5.race_ethnicity 6.race_ethnicity 1.female 2.metroarea 3.metroarea 1.union
               2.marital 3.marital
at           : 1.educat        =    .1261088 (mean)
               2.educat        =     .286954 (mean)
               3.educat        =    .1488497 (mean)
               4.educat        =    .0969254 (mean)
               5.educat        =    .2079617 (mean)
               6.educat        =    .1332003 (mean)
               exp             =    32.89247 (mean)
               exp2            =    1467.595 (mean)
               1.race_eth~y    =    .0098079 (mean)
               2.race_eth~y    =    .0622611 (mean)
               3.race_eth~y    =    .0931032 (mean)
               4.race_eth~y    =    .1487776 (mean)
               5.race_eth~y    =    .6685738 (mean)
               6.race_eth~y    =    .0174764 (mean)
               0.female        =    .4804923 (mean)
               1.female        =    .5195077 (mean)
               1.metroarea     =    .8000673 (mean)
               2.metroarea     =    .1904132 (mean)
               3.metroarea     =    .0095195 (mean)
               0.union         =    .9508642 (mean)
               1.union         =    .0491358 (mean)
               1.marital       =    .5163586 (mean)
               2.marital       =    .1819515 (mean)
               3.marital       =    .3016899 (mean)

──────────────────────────┬────────────────────────────────────────────────────────────────
                          │            Delta-method
                          │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
                   educat │
                     HSD  │   .0005216   .0755305     0.01   0.994    -.1475154    .1485587
            Some College  │  -.0012432   .1546272    -0.01   0.994    -.3043069    .3018206
           AA/Vocational  │  -.0029475   .3436986    -0.01   0.993    -.6765844    .6706894
                   BS/BA  │  -.0022892   .2747643    -0.01   0.993    -.5408173    .5362389
Graduate or Professional  │  -.0047363   .5466531    -0.01   0.993    -1.076157    1.066684
                          │
                      exp │   .0000394   .0076697     0.01   0.996     -.014993    .0150718
                     exp2 │  -4.07e-06   .0004725    -0.01   0.993    -.0009301     .000922
                          │
           race_ethnicity │
  Asian/Pacific Islander  │  -.0089668   1.008282    -0.01   0.993    -1.985164     1.96723
                   Black  │  -.0051358   .5753938    -0.01   0.993    -1.132887    1.122615
         Hispanic/Latino  │  -.0069558   .7820821    -0.01   0.993    -1.539809    1.525897
                   White  │  -.0084662   .9520543    -0.01   0.993    -1.874458    1.857526
             Multiracial  │  -.0058748   .6589403    -0.01   0.993    -1.297374    1.285625
                          │
                   female │
                  Female  │  -.0011625   .1324367    -0.01   0.993    -.2607337    .2584087
                          │
                metroarea │
           Nonmetro Area  │  -.0008667   .0980645    -0.01   0.993    -.1930696    .1913361
          Not Identified  │   .0005835   .0659971     0.01   0.993    -.1287684    .1299354
                          │
                    union │
                   Union  │   -.017075   .0009387   -18.19   0.000    -.0189148   -.0152352
                          │
                  marital │
    Divorced/Sep/Widowed  │   .0055891    .631376     0.01   0.993    -1.231885    1.243063
           Never Married  │   .0067698   .7645739     0.01   0.993    -1.491768    1.505307
──────────────────────────┴────────────────────────────────────────────────────────────────
Note: dy/dx for factor levels is the discrete change from the base level.

. margins, dydx(*) atmeans predict(outcome(3))

Conditional marginal effects                    Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==NILF), predict(outcome(3))
dy/dx w.r.t. : 2.educat 3.educat 4.educat 5.educat 6.educat exp exp2 2.race_ethnicity 3.race_ethnicity
               4.race_ethnicity 5.race_ethnicity 6.race_ethnicity 1.female 2.metroarea 3.metroarea 1.union
               2.marital 3.marital
at           : 1.educat        =    .1261088 (mean)
               2.educat        =     .286954 (mean)
               3.educat        =    .1488497 (mean)
               4.educat        =    .0969254 (mean)
               5.educat        =    .2079617 (mean)
               6.educat        =    .1332003 (mean)
               exp             =    32.89247 (mean)
               exp2            =    1467.595 (mean)
               1.race_eth~y    =    .0098079 (mean)
               2.race_eth~y    =    .0622611 (mean)
               3.race_eth~y    =    .0931032 (mean)
               4.race_eth~y    =    .1487776 (mean)
               5.race_eth~y    =    .6685738 (mean)
               6.race_eth~y    =    .0174764 (mean)
               0.female        =    .4804923 (mean)
               1.female        =    .5195077 (mean)
               1.metroarea     =    .8000673 (mean)
               2.metroarea     =    .1904132 (mean)
               3.metroarea     =    .0095195 (mean)
               0.union         =    .9508642 (mean)
               1.union         =    .0491358 (mean)
               1.marital       =    .5163586 (mean)
               2.marital       =    .1819515 (mean)
               3.marital       =    .3016899 (mean)

──────────────────────────┬────────────────────────────────────────────────────────────────
                          │            Delta-method
                          │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
                   educat │
                     HSD  │  -.1760991   1.475242    -0.12   0.905     -3.06752    2.715321
            Some College  │  -.1952498   1.746298    -0.11   0.911    -3.617932    3.227432
           AA/Vocational  │  -.2377865   2.409215    -0.10   0.921    -4.959762    4.484189
                   BS/BA  │  -.2883071   3.317729    -0.09   0.931    -6.790936    6.214322
Graduate or Professional  │  -.3176566   3.916079    -0.08   0.935    -7.993031    7.357718
                          │
                      exp │  -.0257319   .3987801    -0.06   0.949    -.8073266    .7558628
                     exp2 │   .0005428   .0084077     0.06   0.949    -.0159359    .0170216
                          │
           race_ethnicity │
  Asian/Pacific Islander  │  -.0360241   .5758254    -0.06   0.950    -1.164621    1.092573
                   Black  │  -.0193216   .3135138    -0.06   0.951    -.6337973    .5951541
         Hispanic/Latino  │    -.08432   1.269837    -0.07   0.947    -2.573155    2.404515
                   White  │  -.0608138   .9093883    -0.07   0.947    -1.843182    1.721554
             Multiracial  │  -.0333911     .49753    -0.07   0.946    -1.008532    .9417497
                          │
                   female │
                  Female  │   .0993401   1.526768     0.07   0.948     -2.89307    3.091751
                          │
                metroarea │
           Nonmetro Area  │   -.001017   .0285739    -0.04   0.972    -.0570207    .0549868
          Not Identified  │  -.0002091   .0267367    -0.01   0.994     -.052612    .0521939
                          │
                    union │
                   Union  │   -.431732   .0031813  -135.71   0.000    -.4379673   -.4254967
                          │
                  marital │
    Divorced/Sep/Widowed  │  -.0100452   .2043758    -0.05   0.961    -.4106144     .390524
           Never Married  │   .0242282    .419673     0.06   0.954    -.7983157     .846772
──────────────────────────┴────────────────────────────────────────────────────────────────
Note: dy/dx for factor levels is the discrete change from the base level.

Change the Base Alternative

We can also change the base or reference alternative with the base() option

mlogit laborforce i.educat exp exp2 i.race_ethnicity i.female i.metroarea i.union i.marital, base(3)

Marginal Effects and Marginsplot

Next, we will estimate and graph the average marginal effects for education and experience

. quietly mlogit laborforce i.educat exp exp2 i.race_ethnicity i.female i.metroarea i.union i.marital, base(1)

Education

For Employed

. margins, dydx(educat) predict(outcome(1))

Average marginal effects                        Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==Employed), predict(outcome(1))
dy/dx w.r.t. : 2.educat 3.educat 4.educat 5.educat 6.educat

──────────────────────────┬────────────────────────────────────────────────────────────────
                          │            Delta-method
                          │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
                   educat │
                     HSD  │   .1309012   .0072675    18.01   0.000     .1166572    .1451452
            Some College  │   .1499942   .0080928    18.53   0.000     .1341326    .1658558
           AA/Vocational  │   .1911979   .0091537    20.89   0.000     .1732569    .2091389
                   BS/BA  │   .2407492   .0078942    30.50   0.000     .2252768    .2562215
Graduate or Professional  │   .2789487    .008527    32.71   0.000     .2622362    .2956613
──────────────────────────┴────────────────────────────────────────────────────────────────
Note: dy/dx for factor levels is the discrete change from the base level.

. marginsplot, allsimplelabels horizontal recast(scatter) name(Employed) yscale(reverse) ytitle("Effect on Pr(Emplo
> yed)") ///
> xtitle("Average Marginal Effects") xline(0) xlabel(-.3(.05).3)

  Variables that uniquely identify margins: _deriv

Individuals with high school degree have 13.1 percentage point more likely to be employed compared to high school dropouts. Individuals with some college are 15 percentage points more likely to be employed compared to high school dropouts. Individuals with Associates or Vocational degrees are 19.1 percentage points more likely to be employed compared to high school dropouts. Individuals with a Bachelor’s degree are 24.1 percentage points more likely to be employed, while individuals with a graduate degree are 27.9 percentage points more likely to be employed compared to high school dropouts.

For Unemployed

. margins, dydx(educat) predict(outcome(2))

Average marginal effects                        Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==Unemployed), predict(outcome(2))
dy/dx w.r.t. : 2.educat 3.educat 4.educat 5.educat 6.educat

──────────────────────────┬────────────────────────────────────────────────────────────────
                          │            Delta-method
                          │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
                   educat │
                     HSD  │   .0021475   .0023962     0.90   0.370     -.002549     .006844
            Some College  │  -.0014486   .0026367    -0.55   0.583    -.0066165    .0037194
           AA/Vocational  │  -.0048071   .0030039    -1.60   0.110    -.0106946    .0010805
                   BS/BA  │  -.0028255   .0026516    -1.07   0.287    -.0080226    .0023715
Graduate or Professional  │  -.0081376   .0028311    -2.87   0.004    -.0136864   -.0025888
──────────────────────────┴────────────────────────────────────────────────────────────────
Note: dy/dx for factor levels is the discrete change from the base level.

. marginsplot, allsimplelabels horizontal recast(scatter) name(Unemployed) yscale(reverse) ytitle("Effect on Pr(Une
> mployed)") ///
> xtitle("Average Marginal Effects") xline(0) xlabel(-.3(.05).3)

  Variables that uniquely identify margins: _deriv

For Not in the Labor Force

. margins, dydx(educat) predict(outcome(3))

Average marginal effects                        Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==NILF), predict(outcome(3))
dy/dx w.r.t. : 2.educat 3.educat 4.educat 5.educat 6.educat

──────────────────────────┬────────────────────────────────────────────────────────────────
                          │            Delta-method
                          │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
──────────────────────────┼────────────────────────────────────────────────────────────────
                   educat │
                     HSD  │  -.1330487   .0071954   -18.49   0.000    -.1471514    -.118946
            Some College  │  -.1485456   .0079934   -18.58   0.000    -.1642124   -.1328789
           AA/Vocational  │  -.1863908   .0090294   -20.64   0.000    -.2040881   -.1686934
                   BS/BA  │  -.2379236   .0077818   -30.57   0.000    -.2531757   -.2226716
Graduate or Professional  │  -.2708111   .0083689   -32.36   0.000    -.2872139   -.2544084
──────────────────────────┴────────────────────────────────────────────────────────────────
Note: dy/dx for factor levels is the discrete change from the base level.

. marginsplot, allsimplelabels horizontal recast(scatter) name(NILF) yscale(reverse) ytitle("Effect on Pr(NILF)") /
> //
> xtitle("Average Marginal Effects") xline(0) xlabel(-.3(.05).3)

  Variables that uniquely identify margins: _deriv

Combine graphs

. graph combine Employed Unemployed NILF, ycommon title("AME of Education") ///
> note("Source: Current Population Survey; 2 is HSD, 3 is Some College, 4 is AA, 5 is BA/BS, and 6 is Graduate Degr
> ee")

. graph export "/Users/Sam/Desktop/Econ 645/Stata/week8_mnlmeducation.png", replace
(file /Users/Sam/Desktop/Econ 645/Stata/week8_mnlmeducation.png written in PNG format)

. graph drop Employed Unemployed NILF

Potential Experience

For Employed

. margins, dydx(exp) at(exp=(0(2)60)) predict(outcome(1)) 

Average marginal effects                        Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==Employed), predict(outcome(1))
dy/dx w.r.t. : exp

1._at        : exp             =           0

2._at        : exp             =           2

3._at        : exp             =           4

4._at        : exp             =           6

5._at        : exp             =           8

6._at        : exp             =          10

7._at        : exp             =          12

8._at        : exp             =          14

9._at        : exp             =          16

10._at       : exp             =          18

11._at       : exp             =          20

12._at       : exp             =          22

13._at       : exp             =          24

14._at       : exp             =          26

15._at       : exp             =          28

16._at       : exp             =          30

17._at       : exp             =          32

18._at       : exp             =          34

19._at       : exp             =          36

20._at       : exp             =          38

21._at       : exp             =          40

22._at       : exp             =          42

23._at       : exp             =          44

24._at       : exp             =          46

25._at       : exp             =          48

26._at       : exp             =          50

27._at       : exp             =          52

28._at       : exp             =          54

29._at       : exp             =          56

30._at       : exp             =          58

31._at       : exp             =          60

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
exp          │
         _at │
          1  │   .0119199   .0001127   105.73   0.000     .0116989    .0121408
          2  │   .0130781   .0001155   113.28   0.000     .0128519    .0133044
          3  │    .013976   .0001461    95.64   0.000     .0136896    .0142625
          4  │   .0145643   .0001796    81.10   0.000     .0142123    .0149163
          5  │   .0148298   .0002035    72.86   0.000     .0144308    .0152287
          6  │   .0147942    .000215    68.83   0.000     .0143729    .0152155
          7  │   .0145067   .0002152    67.40   0.000     .0140849    .0149285
          8  │   .0140316   .0002075    67.61   0.000     .0136249    .0144384
          9  │   .0134367   .0001953    68.81   0.000      .013054    .0138195
         10  │   .0127838   .0001811    70.60   0.000     .0124289    .0131387
         11  │   .0121231   .0001667    72.73   0.000     .0117965    .0124498
         12  │   .0114913    .000153    75.12   0.000     .0111915    .0117912
         13  │   .0109121   .0001403    77.76   0.000     .0106371    .0111872
         14  │   .0103981   .0001289    80.69   0.000     .0101455    .0106507
         15  │   .0099532   .0001186    83.95   0.000     .0097208    .0101855
         16  │   .0095751   .0001094    87.52   0.000     .0093606    .0097895
         17  │   .0092574   .0001014    91.30   0.000     .0090586    .0094561
         18  │   .0089913   .0000945    95.12   0.000      .008806    .0091766
         19  │   .0087672   .0000887    98.80   0.000     .0085933    .0089411
         20  │   .0085751   .0000839   102.22   0.000     .0084107    .0087395
         21  │   .0084059   .0000798   105.34   0.000     .0082495    .0085623
         22  │   .0082513   .0000762   108.24   0.000     .0081019    .0084007
         23  │   .0081038   .0000729   111.13   0.000     .0079609    .0082468
         24  │   .0079576   .0000697   114.23   0.000      .007821    .0080941
         25  │   .0078075   .0000663   117.83   0.000     .0076776    .0079373
         26  │   .0076498   .0000626   122.19   0.000     .0075271    .0077725
         27  │   .0074817   .0000586   127.62   0.000     .0073668    .0075967
         28  │   .0073016   .0000543   134.46   0.000     .0071952    .0074081
         29  │   .0071087   .0000497   143.16   0.000     .0070113     .007206
         30  │   .0069031   .0000447   154.27   0.000     .0068154    .0069908
         31  │   .0066862   .0000397   168.35   0.000     .0066084    .0067641
─────────────┴────────────────────────────────────────────────────────────────

. marginsplot, allsimplelabels name(Employed) ytitle("Effect on Pr(Employed)") xtitle("Average Marginal Effects")

  Variables that uniquely identify margins: exp

For Not in the Labor Force

. margins, dydx(exp) at(exp=(0(2)60)) predict(outcome(3))

Average marginal effects                        Number of obs     =     41,599
Model VCE    : OIM

Expression   : Pr(laborforcestatus==NILF), predict(outcome(3))
dy/dx w.r.t. : exp

1._at        : exp             =           0

2._at        : exp             =           2

3._at        : exp             =           4

4._at        : exp             =           6

5._at        : exp             =           8

6._at        : exp             =          10

7._at        : exp             =          12

8._at        : exp             =          14

9._at        : exp             =          16

10._at       : exp             =          18

11._at       : exp             =          20

12._at       : exp             =          22

13._at       : exp             =          24

14._at       : exp             =          26

15._at       : exp             =          28

16._at       : exp             =          30

17._at       : exp             =          32

18._at       : exp             =          34

19._at       : exp             =          36

20._at       : exp             =          38

21._at       : exp             =          40

22._at       : exp             =          42

23._at       : exp             =          44

24._at       : exp             =          46

25._at       : exp             =          48

26._at       : exp             =          50

27._at       : exp             =          52

28._at       : exp             =          54

29._at       : exp             =          56

30._at       : exp             =          58

31._at       : exp             =          60

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
exp          │
         _at │
          1  │  -.0125848   .0001155  -108.97   0.000    -.0128111   -.0123584
          2  │  -.0137412   .0001266  -108.55   0.000    -.0139893    -.013493
          3  │  -.0146118    .000159   -91.88   0.000    -.0149235   -.0143001
          4  │  -.0151489   .0001885   -80.35   0.000    -.0155185   -.0147794
          5  │  -.0153436   .0002048   -74.91   0.000     -.015745   -.0149421
          6  │  -.0152237   .0002061   -73.87   0.000    -.0156276   -.0148198
          7  │  -.0148454   .0001949   -76.17   0.000    -.0152274   -.0144634
          8  │  -.0142796   .0001756   -81.30   0.000    -.0146238   -.0139353
          9  │  -.0135992   .0001528   -89.00   0.000    -.0138986   -.0132997
         10  │  -.0128695   .0001299   -99.06   0.000    -.0131241   -.0126148
         11  │  -.0121426   .0001093  -111.07   0.000    -.0123569   -.0119284
         12  │  -.0114558   .0000922  -124.29   0.000    -.0116365   -.0112752
         13  │  -.0108322   .0000787  -137.62   0.000    -.0109865   -.0106779
         14  │  -.0102833   .0000686  -149.81   0.000    -.0104178   -.0101487
         15  │  -.0098117   .0000614  -159.70   0.000    -.0099321   -.0096913
         16  │  -.0094137   .0000566  -166.42   0.000    -.0095246   -.0093028
         17  │  -.0090816   .0000536  -169.49   0.000    -.0091866   -.0089766
         18  │  -.0088055   .0000521  -169.01   0.000    -.0089076   -.0087033
         19  │  -.0085745   .0000517  -165.70   0.000    -.0086759   -.0084731
         20  │   -.008378   .0000521  -160.73   0.000    -.0084802   -.0082759
         21  │  -.0082063   .0000528  -155.31   0.000    -.0083098   -.0081027
         22  │  -.0080503   .0000535  -150.38   0.000    -.0081552   -.0079453
         23  │  -.0079024   .0000539  -146.55   0.000    -.0080081   -.0077967
         24  │  -.0077563   .0000538  -144.13   0.000    -.0078617   -.0076508
         25  │  -.0076067   .0000531  -143.27   0.000    -.0077108   -.0075027
         26  │  -.0074499   .0000517  -144.07   0.000    -.0075512   -.0073485
         27  │  -.0072829   .0000497  -146.60   0.000    -.0073802   -.0071855
         28  │   -.007104    .000047  -151.07   0.000    -.0071962   -.0070118
         29  │  -.0069124   .0000438  -157.78   0.000    -.0069983   -.0068266
         30  │  -.0067085   .0000401  -167.14   0.000    -.0067872   -.0066298
         31  │  -.0064935   .0000362  -179.60   0.000    -.0065643   -.0064226
─────────────┴────────────────────────────────────────────────────────────────

Combine Graphs

. marginsplot, allsimplelabels name(NILF) ytitle("Effect on Pr(NILF)") xtitle("Average Marginal Effects")

  Variables that uniquely identify margins: exp

. graph combine Employed NILF, ycommon title("AME of Potential Experience") 

. graph export "/Users/Sam/Desktop/Econ 645/Stata/week8_mnlmexp.png", replace
(file /Users/Sam/Desktop/Econ 645/Stata/week8_mnlmexp.png written in PNG format)

Marginal Effects of Potential Experience

. graph drop Employed NILF

We can use coefplot with margins with eststo

. est clear

. eststo mnlm: quietly mlogit laborforce i.educat exp exp2 i.race_ethnicity i.female i.metroarea i.union i.marital

Estimate the average marginal effects. Please note the post option when storing margin results

. eststo Employed: quietly margins, dydx(educat) predict(outcome(1)) post

. estimates restore mnlm
(results mnlm are active now)

. eststo Unemployed: quietly margins, dydx(educat) predict(outcome(2)) post

. estimates restore mnlm
(results mnlm are active now)

. eststo NILF: quietly margins, dydx(educat) predict(outcome(3)) post

Use coefplot

. coefplot Employed Unemployed NILF, ///
> recast(bar) barw(0.15) vertical ///
> ciopts(recast(rcap) color(gs8)) citop ///
> xlab(1 "High School" 2 "Some College" 3 "Associates" 4 "Bachelor" 5 "Graduate") ///
> ytitle("Average Marginal Effect") ///
> xtitle("High Level of Education Attained") ///
> title("MNLM: Probability of Labor Force Status", size(*0.7))    ///
> subtitle("By Education Relative to High School Dropout")    ///
> caption("Source: Current Population Survey", size(*0.75)) ///
> name(coefplot3)

. graph export "/Users/Sam/Desktop/Econ 645/Stata/week8_mnlmcoefplot.png", replace
(file /Users/Sam/Desktop/Econ 645/Stata/week8_mnlmcoefplot.png written in PNG format)

. graph drop coefplot3

Tobit Estimator

We’ll look at hours of labor being supplied.

Married Women’s Annual Labor Supply

Lesson: 1) Tobit and OLS have the same sign; 2) Tobit and OLS magnitudes are not directly comparable. We need an adjustment factor, or use marginal effects.

We have data on married women’ annual labor supply with hours of work for wage in the labor force. There are 428 women employed with hours, and 325 women have no hours. Since we have a sizable about of 0 (corner soluation), we can use a Tobit model.

. use mroz.dta, clear

Summarize hours

. sum hours

    Variable │        Obs        Mean    Std. Dev.       Min        Max
─────────────┼─────────────────────────────────────────────────────────
       hours │        753    740.5764    871.3142          0       4950

. tab hours if hours == 0

      hours │      Freq.     Percent        Cum.
────────────┼───────────────────────────────────
          0 │        325      100.00      100.00
────────────┼───────────────────────────────────
      Total │        325      100.00

tab hours inlf

. tabstat hours, by(inlf) stat(mean median sd)

Summary for variables: hours
     by categories of: inlf 

    inlf │      mean       p50        sd
─────────┼──────────────────────────────
       0 │         0         0         0
       1 │   1302.93    1365.5  776.2744
─────────┼──────────────────────────────
   Total │  740.5764       288  871.3142
─────────┴──────────────────────────────

We have 325 women who had 0 hours

. histogram hours
(bin=27, start=0, width=183.33333)

. graph export "/Users/Sam/Desktop/Econ 645/Stata/week8_hourhist.png", replace
(file /Users/Sam/Desktop/Econ 645/Stata/week8_hourhist.png written in PNG format)

Histogram of Annual Hours Worked We have corner solution for women have 0 hours of labor

The range for women who do have working hours - ranges from 12 to 4950 hours

. sum hours if hours > 0

    Variable │        Obs        Mean    Std. Dev.       Min        Max
─────────────┼─────────────────────────────────────────────────────────
       hours │        428     1302.93    776.2744         12       4950

OLS Model

. est clear

. eststo OLS: reg hours nwifeinc educ exper expersq age kidslt6 kidsge6

      Source │       SS           df       MS      Number of obs   =       753
─────────────┼──────────────────────────────────   F(7, 745)       =     38.50
       Model │   151647606         7  21663943.7   Prob > F        =    0.0000
    Residual │   419262118       745  562767.944   R-squared       =    0.2656
─────────────┼──────────────────────────────────   Adj R-squared   =    0.2587
       Total │   570909724       752  759188.463   Root MSE        =    750.18

─────────────┬────────────────────────────────────────────────────────────────
       hours │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -3.446636      2.544    -1.35   0.176    -8.440898    1.547626
        educ │   28.76112   12.95459     2.22   0.027     3.329283    54.19297
       exper │   65.67251   9.962983     6.59   0.000     46.11365    85.23138
     expersq │  -.7004939   .3245501    -2.16   0.031    -1.337635   -.0633524
         age │  -30.51163   4.363868    -6.99   0.000    -39.07858   -21.94469
     kidslt6 │  -442.0899    58.8466    -7.51   0.000    -557.6148    -326.565
     kidsge6 │  -32.77923   23.17622    -1.41   0.158     -78.2777    12.71924
       _cons │   1330.482   270.7846     4.91   0.000     798.8906    1862.074
─────────────┴────────────────────────────────────────────────────────────────

. margins

Predictive margins                              Number of obs     =        753
Model VCE    : OLS

Expression   : Linear prediction, predict()

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
       _cons │   740.5764   27.33803    27.09   0.000     686.9076    794.2451
─────────────┴────────────────────────────────────────────────────────────────

Tobit Model

. eststo TOBIT: tobit hours nwifeinc educ exper expersq age kidslt6 kidsge6, ll(0)

Tobit regression                                Number of obs     =        753
                                                LR chi2(7)        =     271.59
                                                Prob > chi2       =     0.0000
Log likelihood = -3819.0946                     Pseudo R2         =     0.0343

─────────────┬────────────────────────────────────────────────────────────────
       hours │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -8.814243   4.459096    -1.98   0.048    -17.56811   -.0603724
        educ │   80.64561   21.58322     3.74   0.000     38.27453    123.0167
       exper │   131.5643   17.27938     7.61   0.000     97.64231    165.4863
     expersq │  -1.864158   .5376615    -3.47   0.001    -2.919667   -.8086479
         age │  -54.40501   7.418496    -7.33   0.000    -68.96862    -39.8414
     kidslt6 │  -894.0217   111.8779    -7.99   0.000    -1113.655   -674.3887
     kidsge6 │    -16.218   38.64136    -0.42   0.675    -92.07675    59.64075
       _cons │   965.3053   446.4358     2.16   0.031     88.88528    1841.725
─────────────┼────────────────────────────────────────────────────────────────
      /sigma │   1122.022   41.57903                      1040.396    1203.647
─────────────┴────────────────────────────────────────────────────────────────
           325  left-censored observations at hours <= 0
           428     uncensored observations
             0 right-censored observations

. quietly sum exper

. local exp2=r(mean)^2

Using ystar tells Stata to act like there is no censoring even though the model allows for it Statelist Discussion 1531196

. margins, dydx(*) predict(ystar(0,.)) at(expersq=`exp2')

Average marginal effects                        Number of obs     =        753
Model VCE    : OIM

Expression   : E(hours*|hours>0), predict(ystar(0,.))
dy/dx w.r.t. : nwifeinc educ exper expersq age kidslt6 kidsge6
at           : expersq         =    113.0141

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -5.223903   2.639553    -1.98   0.048    -10.39733   -.0504745
        educ │   47.79592    12.7368     3.75   0.000     22.83224     72.7596
       exper │    77.9737   9.900685     7.88   0.000     58.56872    97.37869
     expersq │  -1.104823    .316282    -3.49   0.000    -1.724724   -.4849218
         age │  -32.24401   4.348403    -7.42   0.000    -40.76672   -23.72129
     kidslt6 │  -529.8564   65.40462    -8.10   0.000    -658.0471   -401.6657
     kidsge6 │  -9.611857   22.90535    -0.42   0.675    -54.50553    35.28181
─────────────┴────────────────────────────────────────────────────────────────

Average Marginal Effects

. eststo AME: margins, dydx(*) predict(ystar(0,.)) at(expersq=`exp2') post

Average marginal effects                        Number of obs     =        753
Model VCE    : OIM

Expression   : E(hours*|hours>0), predict(ystar(0,.))
dy/dx w.r.t. : nwifeinc educ exper expersq age kidslt6 kidsge6
at           : expersq         =    113.0141

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -5.223903   2.639553    -1.98   0.048    -10.39733   -.0504745
        educ │   47.79592    12.7368     3.75   0.000     22.83224     72.7596
       exper │    77.9737   9.900685     7.88   0.000     58.56872    97.37869
     expersq │  -1.104823    .316282    -3.49   0.000    -1.724724   -.4849218
         age │  -32.24401   4.348403    -7.42   0.000    -40.76672   -23.72129
     kidslt6 │  -529.8564   65.40462    -8.10   0.000    -658.0471   -401.6657
     kidsge6 │  -9.611857   22.90535    -0.42   0.675    -54.50553    35.28181
─────────────┴────────────────────────────────────────────────────────────────

Marginal Effects at the Average

. quietly tobit hours nwifeinc educ exper expersq age kidslt6 kidsge6, ll(0)

. eststo MEA: margins, dydx(*) predict(ystar(0,.)) at(expersq=`exp2') atmeans post

Conditional marginal effects                    Number of obs     =        753
Model VCE    : OIM

Expression   : E(hours*|hours>0), predict(ystar(0,.))
dy/dx w.r.t. : nwifeinc educ exper expersq age kidslt6 kidsge6
at           : nwifeinc        =    20.12896 (mean)
               educ            =    12.28685 (mean)
               exper           =    10.63081 (mean)
               expersq         =    113.0141
               age             =    42.53785 (mean)
               kidslt6         =    .2377158 (mean)
               kidsge6         =    1.353254 (mean)

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
    nwifeinc │  -5.687381   2.877882    -1.98   0.048    -11.32793   -.0468358
        educ │   52.03649   13.82013     3.77   0.000     24.94954    79.12345
       exper │   84.89173   12.39757     6.85   0.000     60.59293    109.1905
     expersq │  -1.202846   .3666136    -3.28   0.001    -1.921395   -.4842964
         age │  -35.10478   4.669466    -7.52   0.000    -44.25676   -25.95279
     kidslt6 │  -576.8666   70.92986    -8.13   0.000    -715.8866   -437.8466
     kidsge6 │  -10.46465   24.93972    -0.42   0.675    -59.34561    38.41632
─────────────┴────────────────────────────────────────────────────────────────

Compare our results. Remember we cannot directly OLS and Tobit due to the scale factor.

. esttab OLS TOBIT, mtitle

────────────────────────────────────────────
                      (1)             (2)   
                      OLS           TOBIT   
────────────────────────────────────────────
main                                        
nwifeinc           -3.447          -8.814*  
                  (-1.35)         (-1.98)   

educ                28.76*          80.65***
                   (2.22)          (3.74)   

exper               65.67***        131.6***
                   (6.59)          (7.61)   

expersq            -0.700*         -1.864***
                  (-2.16)         (-3.47)   

age                -30.51***       -54.41***
                  (-6.99)         (-7.33)   

kidslt6            -442.1***       -894.0***
                  (-7.51)         (-7.99)   

kidsge6            -32.78          -16.22   
                  (-1.41)         (-0.42)   

_cons              1330.5***        965.3*  
                   (4.91)          (2.16)   
────────────────────────────────────────────
sigma                                       
_cons                              1122.0***
                                  (26.99)   
────────────────────────────────────────────
N                     753             753   
────────────────────────────────────────────
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Compare OLS and Average Marginal Effects and Marginal Effects at the Average.

. esttab OLS AME MEA, mtitle("OLS" "AME" "MEA")

────────────────────────────────────────────────────────────
                      (1)             (2)             (3)   
                      OLS             AME             MEA   
────────────────────────────────────────────────────────────
nwifeinc           -3.447          -5.224*         -5.687*  
                  (-1.35)         (-1.98)         (-1.98)   

educ                28.76*          47.80***        52.04***
                   (2.22)          (3.75)          (3.77)   

exper               65.67***        77.97***        84.89***
                   (6.59)          (7.88)          (6.85)   

expersq            -0.700*         -1.105***       -1.203** 
                  (-2.16)         (-3.49)         (-3.28)   

age                -30.51***       -32.24***       -35.10***
                  (-6.99)         (-7.42)         (-7.52)   

kidslt6            -442.1***       -529.9***       -576.9***
                  (-7.51)         (-8.10)         (-8.13)   

kidsge6            -32.78          -9.612          -10.46   
                  (-1.41)         (-0.42)         (-0.42)   

_cons              1330.5***                                
                   (4.91)                                   
────────────────────────────────────────────────────────────
N                     753             753             753   
────────────────────────────────────────────────────────────
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

When we compare marginal effects, we see that an additional year of education increases annual hours by 48 to 52 hours with the Tobit estimator. We see that an additional child less than 6 reduces the annual hours by a range of 530 to 577 hours.

Compare Graphs

. est clear

OLS Model

. quietly reg hours nwifeinc educ exper expersq age kidslt6 kidsge6

. eststo OLS: margins, at(educ=(0(2)20)) post

Predictive margins                              Number of obs     =        753
Model VCE    : OLS

Expression   : Linear prediction, predict()

1._at        : educ            =           0

2._at        : educ            =           2

3._at        : educ            =           4

4._at        : educ            =           6

5._at        : educ            =           8

6._at        : educ            =          10

7._at        : educ            =          12

8._at        : educ            =          14

9._at        : educ            =          16

10._at       : educ            =          18

11._at       : educ            =          20

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
         _at │
          1  │   387.1927   161.5017     2.40   0.017        70.14    704.2453
          2  │   444.7149   136.0372     3.27   0.001     177.6531    711.7767
          3  │   502.2372    110.779     4.53   0.000      284.761    719.7133
          4  │   559.7594    85.9094     6.52   0.000     391.1061    728.4127
          5  │   617.2817   61.89861     9.97   0.000     495.7652    738.7981
          6  │   674.8039   40.31156    16.74   0.000     595.6661    753.9417
          7  │   732.3262   27.58943    26.54   0.000     678.1639    786.4884
          8  │   789.8484   35.21225    22.43   0.000     720.7214    858.9754
          9  │   847.3707    55.3281    15.32   0.000     738.7531    955.9882
         10  │   904.8929   78.89908    11.47   0.000     750.0019    1059.784
         11  │   962.4152    103.593     9.29   0.000     759.0463    1165.784
─────────────┴────────────────────────────────────────────────────────────────

Tobit Model

. quietly tobit hours nwifeinc educ exper expersq age kidslt6 kidsge6, ll(0)

. eststo Tobit: margins, at(educ=(0(2)20)) predict(e(0,6000)) post

Predictive margins                              Number of obs     =        753
Model VCE    : OIM

Expression   : E(hours|0<hours<6000), predict(e(0,6000))

1._at        : educ            =           0

2._at        : educ            =           2

3._at        : educ            =           4

4._at        : educ            =           6

5._at        : educ            =           8

6._at        : educ            =          10

7._at        : educ            =          12

8._at        : educ            =          14

9._at        : educ            =          16

10._at       : educ            =          18

11._at       : educ            =          20

─────────────┬────────────────────────────────────────────────────────────────
             │            Delta-method
             │     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
         _at │
          1  │   727.4644   76.74259     9.48   0.000     577.0517    877.8772
          2  │   773.4964   71.05237    10.89   0.000     634.2364    912.7565
          3  │   823.5401   63.95927    12.88   0.000     698.1822    948.8979
          4  │   877.8933   55.47809    15.82   0.000     769.1582    986.6283
          5  │   936.8485   45.95104    20.39   0.000     846.7861    1026.911
          6  │   1000.685   36.67591    27.28   0.000     928.8013    1072.568
          7  │   1069.659   31.51807    33.94   0.000     1007.885    1131.433
          8  │   1143.997   36.60999    31.25   0.000     1072.242    1215.751
          9  │   1223.883   52.37388    23.37   0.000     1121.232    1326.534
         10  │   1309.452   75.03347    17.45   0.000      1162.39    1456.515
         11  │   1400.783   102.4283    13.68   0.000     1200.027    1601.539
─────────────┴────────────────────────────────────────────────────────────────

. coefplot (OLS, ciopts(recast(rline) lpattern(solid))) ///
> (Tobit, ciopts(recast(rline) lpattern(dash))), at recast(line)

. graph export "/Users/Sam/Desktop/Econ 645/Stata/week8_tobitols.png", replace
(file /Users/Sam/Desktop/Econ 645/Stata/week8_tobitols.png written in PNG format)

Mitchell

Chapter 8: Processing observations across subgroups

Set Working Directory

. cd "/Users/Sam/Desktop/Econ 645/Data/Mitchell"
/Users/Sam/Desktop/Econ 645/Data/Mitchell

One thing Stata easily provides are commands and options for subgroup analysis. We can use the by prefix command to create and analyze subgroups or cross-sectional units in a panel data set.

8.2 Obtaining separate results for subgroups

Tabulate is a very helpful command to analyze categorical variables, or occassionally look through continuous variables (as long as there aren’t too many values). The tabulate command has an option to summarize a continuous variable when tabulating categorical variables.

. use wws2, clear
(Working Women Survey w/fixes)

. tabulate married, summarize(wage)

            │       Summary of hourly wage
    married │        Mean   Std. Dev.       Freq.
────────────┼────────────────────────────────────
          0 │   8.0920006    6.354849         804
          1 │   7.6319496   5.5017864       1,440
────────────┼────────────────────────────────────
      Total │   7.7967807   5.8245895       2,244

Another option is using the bysort prefix command

. bysort married: summarize wage

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-> married = 0

    Variable │        Obs        Mean    Std. Dev.       Min        Max
─────────────┼─────────────────────────────────────────────────────────
        wage │        804    8.092001    6.354849          0   40.19808

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-> married = 1

    Variable │        Obs        Mean    Std. Dev.       Min        Max
─────────────┼─────────────────────────────────────────────────────────
        wage │      1,440     7.63195    5.501786   1.004952   40.74659

We can also correlate data within groups instead of using qualifiers and additional statements With qualifiers

. correlate wage age if married == 0
(obs=804)

             │     wage      age
─────────────┼──────────────────
        wage │   1.0000
         age │  -0.0185   1.0000


. correlate wage age if married == 1
(obs=1,440)

             │     wage      age
─────────────┼──────────────────
        wage │   1.0000
         age │   0.0049   1.0000

Using bysort accomplishes this in one command

. bysort married: correlate wage age

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-> married = 0
(obs=804)

             │     wage      age
─────────────┼──────────────────
        wage │   1.0000
         age │  -0.0185   1.0000


───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-> married = 1
(obs=1,440)

             │     wage      age
─────────────┼──────────────────
        wage │   1.0000
         age │   0.0049   1.0000

Using bysort accomplishes even faster if we have a categorical variable with many categories

. bysort race: correlate wage age

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-> race = 1
(obs=1,637)

             │     wage      age
─────────────┼──────────────────
        wage │   1.0000
         age │   0.0017   1.0000


───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-> race = 2
(obs=581)

             │     wage      age
─────────────┼──────────────────
        wage │   1.0000
         age │  -0.0331   1.0000


───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-> race = 3
(obs=26)

             │     wage      age
─────────────┼──────────────────
        wage │   1.0000
         age │  -0.2194   1.0000

8.3 Computing values separately by subgroups

The by prefix command and the egen command is a powerful combination that makes aggregating group statistics much easier than other statistical software packages Bysort and egen makes aggregating by groups much easier than other software. R has aggregate which is flexible and powerful, but requires more coding.

I’m not the biggest fan of Mitchell’s examples with bysort var: egen, but they get the job done. I would like us to use some CPS examples with bysort and egen.

With bysort var: egen we can calculate subgroup statistics, counts, summations with one easy line of code

. use tv1, clear

. list, sepby(kidid)

     ┌────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac │
     ├────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1 │
  2. │     1   08jan2002        1   55    3     1 │
     ├────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1 │
     ├────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0 │
  5. │     3   19jan2002        0   63    5     1 │
  6. │     3   21jan2002        0   66    1     1 │
  7. │     3   22jan2002        0   64    6     0 │
     ├────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0 │
  9. │     4   11jan2002        1   58    1     0 │
 10. │     4   13jan2002        1   55    4     0 │
     └────────────────────────────────────────────┘

If we want to calculate the average tv time for each kid we first sort by the child id and then use egen. We may have multiple levels and we can use bysort to find the multiple level identifiers such as id, year, and month of year

. bysort kidid: egen avgtv = mean(tv)

. sort kidid

. list kidid tv avgtv, sepby(kidid)

     ┌────────────────────┐
     │ kidid   tv   avgtv │
     ├────────────────────┤
  1. │     1    1       2 │
  2. │     1    3       2 │
     ├────────────────────┤
  3. │     2    8       8 │
     ├────────────────────┤
  4. │     3    2     3.5 │
  5. │     3    5     3.5 │
  6. │     3    1     3.5 │
  7. │     3    6     3.5 │
     ├────────────────────┤
  8. │     4    7       4 │
  9. │     4    1       4 │
 10. │     4    4       4 │
     └────────────────────┘

If we want the standard deviation of the child’s tv watching

. bysort kidid: egen sdtv = sd(tv)
(1 missing value generated)

Let’s generate some z-scores

. generate ztv = (tv-avgtv)/sdtv
(1 missing value generated)

Let’s look at our statistics

. list 

     ┌─────────────────────────────────────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac   avgtv        sdtv          ztv │
     ├─────────────────────────────────────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1       2   1.4142136   -.70710678 │
  2. │     1   08jan2002        1   55    3     1       2   1.4142136    .70710678 │
  3. │     2   16jan2002        1   58    8     1       8           .            . │
  4. │     3   18jan2002        0   60    2     0     3.5   2.3804761   -.63012604 │
  5. │     3   19jan2002        0   63    5     1     3.5   2.3804761    .63012604 │
     ├─────────────────────────────────────────────────────────────────────────────┤
  6. │     3   21jan2002        0   66    1     1     3.5   2.3804761   -1.0502101 │
  7. │     3   22jan2002        0   64    6     0     3.5   2.3804761    1.0502101 │
  8. │     4   10jan2002        1   62    7     0       4           3            1 │
  9. │     4   11jan2002        1   58    1     0       4           3           -1 │
 10. │     4   13jan2002        1   55    4     0       4           3            0 │
     └─────────────────────────────────────────────────────────────────────────────┘

Let’s generate some subgroup statistics for a binary variable vac Where vac=0 if the kid was not on vacation and vac=1 if the kid was on vacation

. bysort kidid: egen vac_total = total(vac)

. bysort kidid: egen vac_sd = sd(vac)
(1 missing value generated)

. bysort kidid: egen vac_min = min(vac)

. bysort kidid: egen vac_max = max(vac)

. list kidid vac*, sepby(kidid) abb(10)

     ┌─────────────────────────────────────────────────────────┐
     │ kidid   vac   vac_total      vac_sd   vac_min   vac_max │
     ├─────────────────────────────────────────────────────────┤
  1. │     1     1           2           0         1         1 │
  2. │     1     1           2           0         1         1 │
     ├─────────────────────────────────────────────────────────┤
  3. │     2     1           1           .         1         1 │
     ├─────────────────────────────────────────────────────────┤
  4. │     3     0           2   .57735027         0         1 │
  5. │     3     1           2   .57735027         0         1 │
  6. │     3     1           2   .57735027         0         1 │
  7. │     3     0           2   .57735027         0         1 │
     ├─────────────────────────────────────────────────────────┤
  8. │     4     0           0           0         0         0 │
  9. │     4     0           0           0         0         0 │
 10. │     4     0           0           0         0         0 │
     └─────────────────────────────────────────────────────────┘

Let’s see if some kids watch less than 4 hours of tv per day. We’ll generate a binary/dummy variable to be 1 if equal to or less than 4 hours a day and 0 if it is greater than 4 hours a day.

. generate tvlo = (tv < 4) if !missing(tv)

We can generate individual level subgroup analysis with bysort and egen on binary variables

. bysort kidid: egen tvlocnt = count(tvlo)

. bysort kidid: egen tvlototal = total(tvlo)

. bysort kidid: egen tvlosum = sum(tvlo)

. bysort kidid: gen tvlosum2 = sum(tvlo)

. bysort kidid: egen tvlosame = sd(tvlo)
(1 missing value generated)

. bysort kidid: egen tvloall = min(tvlo)

. bysort kidid: egen tvloever = max(tvlo)

. list kidid tv tvlo*, sepby(kidid) abb(20)

     ┌───────────────────────────────────────────────────────────────────────────────────────────────┐
     │ kidid   tv   tvlo   tvlocnt   tvlototal   tvlosum   tvlosum2    tvlosame   tvloall   tvloever │
     ├───────────────────────────────────────────────────────────────────────────────────────────────┤
  1. │     1    1      1         2           2         2          1           0         1          1 │
  2. │     1    3      1         2           2         2          2           0         1          1 │
     ├───────────────────────────────────────────────────────────────────────────────────────────────┤
  3. │     2    8      0         1           0         0          0           .         0          0 │
     ├───────────────────────────────────────────────────────────────────────────────────────────────┤
  4. │     3    2      1         4           2         2          1   .57735027         0          1 │
  5. │     3    5      0         4           2         2          1   .57735027         0          1 │
  6. │     3    1      1         4           2         2          2   .57735027         0          1 │
  7. │     3    6      0         4           2         2          2   .57735027         0          1 │
     ├───────────────────────────────────────────────────────────────────────────────────────────────┤
  8. │     4    7      0         3           1         1          0   .57735027         0          1 │
  9. │     4    1      1         3           1         1          1   .57735027         0          1 │
 10. │     4    4      0         3           1         1          1   .57735027         0          1 │
     └───────────────────────────────────────────────────────────────────────────────────────────────┘

Notice how count() provides the number of observations for each kid, while total() returns a constant for the sum of all values, but so does egen sum(). The problem is that there is a gen var = sum(var2) function that returns a running sum that we see in tvsum2. I usually use egen sum, but I think egen total() is the more appropriate function to use when returning a constant.

We have our central tendencies functions with mean(), median(), and mode(). We can find percentiles with pctile(var), p(#). We have other egen functions that may be of help, such as iqr(), Median Absolute Deviation mad(), Mean Absolute Deviation mdev(), kurt(), skew(), etc.

. help egen

Mitchell has a good note here Egen mean() takes an arguement, not a varlist, so if you put bysort idvar: egen meanvars1_5=mean(var1-var5), mean() will return not the means of vars 1 through 5, but var1 minus var5

8.4 Computing values within subgroups: Subscripting or Indexing

Unsolicated Opinion Alert: Subscripting (or I may accidently call it indexing) is a very powerful tool that I personally think puts Stata as the top paid statistical software (I do think that R is more powerful and more flexible, but Stata balances power, flexibility, and ease of learning).

. use tv1, clear

Each variable is a vector x1=x[x11, x12, x13,…,x1N] for i=1,…,N observations We can use a subscript or index to call which part of the vector we want to return.

. list, sepby(kidid)

     ┌────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac │
     ├────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1 │
  2. │     1   08jan2002        1   55    3     1 │
     ├────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1 │
     ├────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0 │
  5. │     3   19jan2002        0   63    5     1 │
  6. │     3   21jan2002        0   66    1     1 │
  7. │     3   22jan2002        0   64    6     0 │
     ├────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0 │
  9. │     4   11jan2002        1   58    1     0 │
 10. │     4   13jan2002        1   55    4     0 │
     └────────────────────────────────────────────┘

If we want the first observation in our tv vector we can call it with [1]

. display tv[1]
1

We can look at the first kid id and date and time

. display "kid: " kidid[1] ", Date: " dt[1] ", Sex: " female[1] ", TV Hours: " tv[1]
kid: 1, Date: 15347, Sex: 1, TV Hours: 1

We can see the second observation

. display tv[2]
3

We can see the difference between the two observations

. display tv[2]-tv[1]
2

Note: we have some very useful system variables of _N and _n _N is total number of observations and when used in the subscript/index it will return the last observation

_n is current number of observations or observation number and when used in the subscript/index it will return the current observation (almost like i=i+1)

. help system variables

Subscripting is very helpful when working Panel Data. You can index within cross-sectional units over time with ease. The subscript (or index) will return the nth observation given

. use tv1, clear

If we want the first observation to be compared to all observations

. bysort kidid: gen tv_1ob = tv[1]

. list kidid tv tv_1ob, sepby(kidid)

     ┌─────────────────────┐
     │ kidid   tv   tv_1ob │
     ├─────────────────────┤
  1. │     1    1        1 │
  2. │     1    3        1 │
     ├─────────────────────┤
  3. │     2    8        8 │
     ├─────────────────────┤
  4. │     3    2        2 │
  5. │     3    5        2 │
  6. │     3    1        2 │
  7. │     3    6        2 │
     ├─────────────────────┤
  8. │     4    7        7 │
  9. │     4    1        7 │
 10. │     4    4        7 │
     └─────────────────────┘

If we want to compare the last observation to all observations

. bysort kidid: gen tv_lastob = tv[_N]

. list kidid tv tv_lastob, sepby(kidid)

     ┌───────────────────────┐
     │ kidid   tv   tv_las~b │
     ├───────────────────────┤
  1. │     1    1          3 │
  2. │     1    3          3 │
     ├───────────────────────┤
  3. │     2    8          8 │
     ├───────────────────────┤
  4. │     3    2          6 │
  5. │     3    5          6 │
  6. │     3    1          6 │
  7. │     3    6          6 │
     ├───────────────────────┤
  8. │     4    7          4 │
  9. │     4    1          4 │
 10. │     4    4          4 │
     └───────────────────────┘

If we want the second to last observation

. bysort kidid: gen tv_2tolastob = tv[_N-1]
(1 missing value generated)

. list kidid tv tv_2tolastob, sepby(kidid)

     ┌───────────────────────┐
     │ kidid   tv   tv_2to~b │
     ├───────────────────────┤
  1. │     1    1          1 │
  2. │     1    3          1 │
     ├───────────────────────┤
  3. │     2    8          . │
     ├───────────────────────┤
  4. │     3    2          1 │
  5. │     3    5          1 │
  6. │     3    1          1 │
  7. │     3    6          1 │
     ├───────────────────────┤
  8. │     4    7          1 │
  9. │     4    1          1 │
 10. │     4    4          1 │
     └───────────────────────┘

If we want the prior observation (lag of 1)

. bysort kidid: gen tv_lagob = tv[_n-1]
(4 missing values generated)

. list kidid tv tv_lagob, sepby(kidid)

     ┌───────────────────────┐
     │ kidid   tv   tv_lagob │
     ├───────────────────────┤
  1. │     1    1          . │
  2. │     1    3          1 │
     ├───────────────────────┤
  3. │     2    8          . │
     ├───────────────────────┤
  4. │     3    2          . │
  5. │     3    5          2 │
  6. │     3    1          5 │
  7. │     3    6          1 │
     ├───────────────────────┤
  8. │     4    7          . │
  9. │     4    1          7 │
 10. │     4    4          1 │
     └───────────────────────┘

If we want the next observation (lead of 1)

. bysort kidid: gen tv_leadob = tv[_n+1]
(4 missing values generated)

. list kidid tv tv_leadob, sepby(kidid)

     ┌───────────────────────┐
     │ kidid   tv   tv_lea~b │
     ├───────────────────────┤
  1. │     1    1          3 │
  2. │     1    3          . │
     ├───────────────────────┤
  3. │     2    8          . │
     ├───────────────────────┤
  4. │     3    2          5 │
  5. │     3    5          1 │
  6. │     3    1          6 │
  7. │     3    6          . │
     ├───────────────────────┤
  8. │     4    7          1 │
  9. │     4    1          4 │
 10. │     4    4          . │
     └───────────────────────┘

You can use bysort kidid (dt) to tell Stata to order by kid id and date, but NOT INCLUDE dt in the grouping. If we use kidid AND tv bysort kidid tv: egen, then we will look for observation Within kid id AND the date. Since there is only 1 observation per kid per date, we will only have 1 observation for each grouping.

. use tv1, clear

If we want the first observation to be compared to all observations

. bysort kidid (dt): gen tv_1ob1 = tv[1]

. bysort kidid dt: gen tv_1ob2 = tv[1]

Compare

. list kidid tv tv_1ob*, sepby(kidid)

     ┌────────────────────────────────┐
     │ kidid   tv   tv_1ob1   tv_1ob2 │
     ├────────────────────────────────┤
  1. │     1    1         1         1 │
  2. │     1    3         1         3 │
     ├────────────────────────────────┤
  3. │     2    8         8         8 │
     ├────────────────────────────────┤
  4. │     3    2         2         2 │
  5. │     3    5         2         5 │
  6. │     3    1         2         1 │
  7. │     3    6         2         6 │
     ├────────────────────────────────┤
  8. │     4    7         7         7 │
  9. │     4    1         7         1 │
 10. │     4    4         7         4 │
     └────────────────────────────────┘

If we want to compare the last observation to all observations

. bysort kidid (dt): gen tv_lastob1 = tv[_N]

. bysort kidid dt: gen tv_lastob2 = tv[_N]

Compare

. list kidid tv tv_lastob*, sepby(kidid)

     ┌──────────────────────────────────┐
     │ kidid   tv   tv_las~1   tv_las~2 │
     ├──────────────────────────────────┤
  1. │     1    1          3          1 │
  2. │     1    3          3          3 │
     ├──────────────────────────────────┤
  3. │     2    8          8          8 │
     ├──────────────────────────────────┤
  4. │     3    2          6          2 │
  5. │     3    5          6          5 │
  6. │     3    1          6          1 │
  7. │     3    6          6          6 │
     ├──────────────────────────────────┤
  8. │     4    7          4          7 │
  9. │     4    1          4          1 │
 10. │     4    4          4          4 │
     └──────────────────────────────────┘

If we want the prior observation (lag of 1)

. bysort kidid (dt): gen tv_lagob1 = tv[_n-1]
(4 missing values generated)

. bysort kidid dt: gen tv_lagob2 = tv[_n-1]
(10 missing values generated)

Compare

. list kidid tv tv_lagob*, sepby(kidid)

     ┌──────────────────────────────────┐
     │ kidid   tv   tv_lag~1   tv_lag~2 │
     ├──────────────────────────────────┤
  1. │     1    1          .          . │
  2. │     1    3          1          . │
     ├──────────────────────────────────┤
  3. │     2    8          .          . │
     ├──────────────────────────────────┤
  4. │     3    2          .          . │
  5. │     3    5          2          . │
  6. │     3    1          5          . │
  7. │     3    6          1          . │
     ├──────────────────────────────────┤
  8. │     4    7          .          . │
  9. │     4    1          7          . │
 10. │     4    4          1          . │
     └──────────────────────────────────┘

8.5 Computing values within subgroups: Computations across observations

Another powerful combination with subscripting/indexing is that we can the generate command to create new variables that perform mathematical operators on different observations WITHIN the vector

. use tv1, clear

Difference in tv time between current period and prior period

. bysort kidid (dt): generate tvdfp = tv - tv[_n-1]
(4 missing values generated)

Difference in tv time between current period and next period

. bysort kidid (dt): generate tvdfs = tv - tv[_n+1]
(4 missing values generated)

Difference in tv time between current period and first period

. bysort kidid (dt): generate tvdff = tv - tv[1]

Difference in tv time between current period and last period

. bysort kidid (dt): generate tvdfl = tv - tv[_N]

Difference between current period and 3-year moving average over time

. bysort kidid (dt): generate tv3avg = (tv[_n-1] + tv[_n] + tv[_n+1])/3
(7 missing values generated)

. list kidid dt tvd* tv3avg

     ┌───────────────────────────────────────────────────────────────┐
     │ kidid          dt   tvdfp   tvdfs   tvdff   tvdfl      tv3avg │
     ├───────────────────────────────────────────────────────────────┤
  1. │     1   07jan2002       .      -2       0      -2           . │
  2. │     1   08jan2002       2       .       2       0           . │
  3. │     2   16jan2002       .       .       0       0           . │
  4. │     3   18jan2002       .      -3       0      -4           . │
  5. │     3   19jan2002       3       4       3      -1   2.6666667 │
     ├───────────────────────────────────────────────────────────────┤
  6. │     3   21jan2002      -4      -5      -1      -5           4 │
  7. │     3   22jan2002       5       .       4       0           . │
  8. │     4   10jan2002       .       6       0       3           . │
  9. │     4   11jan2002      -6      -3      -6      -3           4 │
 10. │     4   13jan2002       3       .      -3       0           . │
     └───────────────────────────────────────────────────────────────┘

We can also rebase our vector. For example, we can rebase a deflator for the period dollars we want.

. import excel using "cpi_1993_2023.xlsx", cellrange(A12:P42) firstrow clear

. keep Year Annual

Rebase in 1993 Dollars

. gen rebase93 = Annual/Annual[1]*100

Rebase in 2022 Dollars

. gen rebase22 = Annual/Annual[_N]*100

Rebase to 2012

. gen rebase12 = Annual/Annual[_N-10]*100

. list

     ┌────────────────────────────────────────────────────┐
     │ Year    Annual    rebase93    rebase22    rebase12 │
     ├────────────────────────────────────────────────────┤
  1. │ 1993     144.5         100   49.375545   62.937185 │
  2. │ 1994     148.2   102.56055   50.639832   64.548725 │
  3. │ 1995     152.4   105.46713   52.074969   66.378041 │
  4. │ 1996     156.9   108.58131   53.612616   68.338023 │
  5. │ 1997     160.5   111.07266   54.842733   69.906008 │
     ├────────────────────────────────────────────────────┤
  6. │ 1998       163   112.80277   55.696981   70.994887 │
  7. │ 1999     166.6   115.29412   56.927098   72.562872 │
  8. │ 2000     172.2   119.16955   58.840614    75.00196 │
  9. │ 2001     177.1   122.56055   60.514941   77.136162 │
 10. │ 2002     179.9   124.49827   61.471699   78.355706 │
     ├────────────────────────────────────────────────────┤
 11. │ 2003       184   127.33564   62.872666   80.141467 │
 12. │ 2004     188.9   130.72664   64.546992   82.275669 │
 13. │ 2005     195.3   135.15571   66.733868   85.063199 │
 14. │ 2006     201.6   139.51557   68.886573   87.807173 │
 15. │ 2007   207.342   143.48927    70.84861   90.308109 │
     ├────────────────────────────────────────────────────┤
 16. │ 2008   215.303   148.99862   73.568878   93.775534 │
 17. │ 2009   214.537   148.46851   73.307136   93.441902 │
 18. │ 2010   218.056   150.90381   74.509576   94.974607 │
 19. │ 2011   224.939   155.66713   76.861492   97.972508 │
 20. │ 2012   229.594   158.88858   78.452102         100 │
     ├────────────────────────────────────────────────────┤
 21. │ 2013   232.957   161.21592   79.601237   101.46476 │
 22. │ 2014   236.736   163.83114   80.892518   103.11071 │
 23. │ 2015   237.017   164.02561   80.988536    103.2331 │
 24. │ 2016   240.007   166.09481   82.010217    104.5354 │
 25. │ 2017    245.12   169.63322   83.757325   106.76237 │
     ├────────────────────────────────────────────────────┤
 26. │ 2018   251.107   173.77647   85.803079   109.37002 │
 27. │ 2019   255.657   176.92526    87.35781   111.35178 │
 28. │ 2020   258.811   179.10796    88.43553   112.72551 │
 29. │ 2021    270.97   187.52249   92.590251   118.02138 │
 30. │ 2022   292.655   202.52941         100   127.46631 │
     └────────────────────────────────────────────────────┘

. graph twoway line Annual Year || line rebase93 Year || ///
> line rebase22 Year, yline(100) ///
> legend(order(1 "100=1982" 2 "100=1993" 3 "100=2022"))

. graph export "/Users/Sam/Desktop/Econ 645/Stata/week8_bls.png", replace
(file /Users/Sam/Desktop/Econ 645/Stata/week8_bls.png written in PNG format)

8.6 Computing values within subgroups: Running sums

As we mentioned earlier, when we use egen sum vs gen sum, we get different results. Egen sum() is similar to egen total() but it can be confusing. When we use gen with sum(), we generate a RUNNING sum not a constant of total.

. use tv1, clear

We can generate the tv running sum across all individuals over time

. generate tvrunsum = sum(tv)

We can generate the tv running sum within an individuals time period

. bysort kidid (dt): generate bytvrunsum=sum(tv)

We can generate the total sum with an individuals time period

. bysort kidid (dt): egen bytvsum=total(tv)

We can generate the total sum for all individuals over time

. egen tvsum = total(tv)

We can also calculate a running average

. bysort kidid (dt): generate bytvrunavg=sum(tv)/_n

We can compute the individual’s average average

. bysort kidid (dt): egen bytvavg = mean(tv)

. list kidid tv tv* by*, sepby(kidid)

     ┌───────────────────────────────────────────────────────────────────────────────┐
     │ kidid   tv   tv   tvrunsum   tvsum   bytvru~m   bytvsum   bytvrun~g   bytvavg │
     ├───────────────────────────────────────────────────────────────────────────────┤
  1. │     1    1    1          1      38          1         4           1         2 │
  2. │     1    3    3          4      38          4         4           2         2 │
     ├───────────────────────────────────────────────────────────────────────────────┤
  3. │     2    8    8         12      38          8         8           8         8 │
     ├───────────────────────────────────────────────────────────────────────────────┤
  4. │     3    2    2         14      38          2        14           2       3.5 │
  5. │     3    5    5         19      38          7        14         3.5       3.5 │
  6. │     3    1    1         20      38          8        14   2.6666667       3.5 │
  7. │     3    6    6         26      38         14        14         3.5       3.5 │
     ├───────────────────────────────────────────────────────────────────────────────┤
  8. │     4    7    7         33      38          7        12           7         4 │
  9. │     4    1    1         34      38          8        12           4         4 │
 10. │     4    4    4         38      38         12        12           4         4 │
     └───────────────────────────────────────────────────────────────────────────────┘

8.7 Computing values within subgroups: More examples

There are other useful calculations we can do with subscripting/indexing. Some do overlap with egen, but it is helpful to know the differences.

Counting

Count the number of observations: this can be done with subscripting or egen depending upon what we want

. use tv1, clear

Generate total observation count per individual missing or not missing:

. bysort kidid (dt): generate idcount=_N

Generate the total observation without missing

. bysort kidid (dt): egen idcount_nomiss = count(tv)

Generate a running count of an observation

. bysort kidid (dt): gen idruncount = _n

. list, sepby(kidid)

     ┌────────────────────────────────────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac   idcount   idcoun~s   idrunc~t │
     ├────────────────────────────────────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1         2          2          1 │
  2. │     1   08jan2002        1   55    3     1         2          2          2 │
     ├────────────────────────────────────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1         1          1          1 │
     ├────────────────────────────────────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0         4          4          1 │
  5. │     3   19jan2002        0   63    5     1         4          4          2 │
  6. │     3   21jan2002        0   66    1     1         4          4          3 │
  7. │     3   22jan2002        0   64    6     0         4          4          4 │
     ├────────────────────────────────────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0         3          3          1 │
  9. │     4   11jan2002        1   58    1     0         3          3          2 │
 10. │     4   13jan2002        1   55    4     0         3          3          3 │
     └────────────────────────────────────────────────────────────────────────────┘

Generate Binaries

We can generate binary variables to find first and last observations or nth observation. This differences from id counts, we are generating binaries for when the qualifier is true.

. use tv1, clear

Find individuals with only one observation

. bysort kidid (dt): generate singleob = (_N==1)

Find the first observation of an individual

. bysort kidid (dt): generate firstob = (_n==1)

Find the last observation of an individual

. bysort kidid (dt): generate lastob = (_n==_N)

. list, sepby(kidid)

     ┌──────────────────────────────────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac   singleob   firstob   lastob │
     ├──────────────────────────────────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1          0         1        0 │
  2. │     1   08jan2002        1   55    3     1          0         0        1 │
     ├──────────────────────────────────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1          1         1        1 │
     ├──────────────────────────────────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0          0         1        0 │
  5. │     3   19jan2002        0   63    5     1          0         0        0 │
  6. │     3   21jan2002        0   66    1     1          0         0        0 │
  7. │     3   22jan2002        0   64    6     0          0         0        1 │
     ├──────────────────────────────────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0          0         1        0 │
  9. │     4   11jan2002        1   58    1     0          0         0        0 │
 10. │     4   13jan2002        1   55    4     0          0         0        1 │
     └──────────────────────────────────────────────────────────────────────────┘

. use tv1, clear

We can create binaries depending upon leads and lags too. look for a change in vac

. bysort kidid (dt): generate vacstart=(vac==1) & (vac[_n-1]==0)

. bysort kidid (dt): generate vacend=(vac==1) & (vac[_n+1]==0)

. list kidid dt vac*, sepby(kidid)

     ┌─────────────────────────────────────────────┐
     │ kidid          dt   vac   vacstart   vacend │
     ├─────────────────────────────────────────────┤
  1. │     1   07jan2002     1          0        0 │
  2. │     1   08jan2002     1          0        0 │
     ├─────────────────────────────────────────────┤
  3. │     2   16jan2002     1          0        0 │
     ├─────────────────────────────────────────────┤
  4. │     3   18jan2002     0          0        0 │
  5. │     3   19jan2002     1          1        0 │
  6. │     3   21jan2002     1          0        1 │
  7. │     3   22jan2002     0          0        0 │
     ├─────────────────────────────────────────────┤
  8. │     4   10jan2002     0          0        0 │
  9. │     4   11jan2002     0          0        0 │
 10. │     4   13jan2002     0          0        0 │
     └─────────────────────────────────────────────┘

Fill in Missing

Another useful tool that we should use with caution is filling in missings. This should only really be applied when we have a constant variable that does not change over time.

. use tv2, clear

. sort kidid dt

. list, sepby(kidid)

     ┌────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac │
     ├────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1 │
  2. │     1   08jan2002        1   55    3     1 │
     ├────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1 │
     ├────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0 │
  5. │     3   19jan2002        0    .    .     . │
  6. │     3   21jan2002        0   66    .     1 │
  7. │     3   22jan2002        0   64    6     0 │
     ├────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0 │
  9. │     4   11jan2002        1   58    .     . │
 10. │     4   13jan2002        1    .    4     0 │
     └────────────────────────────────────────────┘

We can backfill the observation with the last nonmissing observation. First generate a copy of the variable with missing values.

. generate tvimp1 = tv
(3 missing values generated)

. bysort kidid (dt): replace tvimp1 = tv[_n-1] if missing(tv)
(2 real changes made)

. list kidid dt tv tvimp1, sepby(kidid)

     ┌─────────────────────────────────┐
     │ kidid          dt   tv   tvimp1 │
     ├─────────────────────────────────┤
  1. │     1   07jan2002    1        1 │
  2. │     1   08jan2002    3        3 │
     ├─────────────────────────────────┤
  3. │     2   16jan2002    8        8 │
     ├─────────────────────────────────┤
  4. │     3   18jan2002    2        2 │
  5. │     3   19jan2002    .        2 │
  6. │     3   21jan2002    .        . │
  7. │     3   22jan2002    6        6 │
     ├─────────────────────────────────┤
  8. │     4   10jan2002    7        7 │
  9. │     4   11jan2002    .        7 │
 10. │     4   13jan2002    4        4 │
     └─────────────────────────────────┘

Notice that we are still missing the 3rd observation for the 3rd kid. It cannot backfill the 3rd observation from the second observation, since the second observation is missing. There are a couple of strategies to use We can generate a new variable like Mitchell

. generate tvimp2 = tvimp1
(1 missing value generated)

. bysort kidid (dt): replace tvimp2 = tvimp2[_n-1] if missing(tvimp2)
(1 real change made)

. list kidid tv tvimp*, sepby(kidid)

     ┌──────────────────────────────┐
     │ kidid   tv   tvimp1   tvimp2 │
     ├──────────────────────────────┤
  1. │     1    1        1        1 │
  2. │     1    3        3        3 │
     ├──────────────────────────────┤
  3. │     2    8        8        8 │
     ├──────────────────────────────┤
  4. │     3    2        2        2 │
  5. │     3    .        2        2 │
  6. │     3    .        .        2 │
  7. │     3    6        6        6 │
     ├──────────────────────────────┤
  8. │     4    7        7        7 │
  9. │     4    .        7        7 │
 10. │     4    4        4        4 │
     └──────────────────────────────┘

You can just replace tvimp1 twice instead of generating a new variable, but that is up to the user. You would use tv[_n-1] for the first replace and tvimp1[_n-1] for the second replace.

Interpolation

We may need to interpolate between 2 known values and assume a linear trend.

. generate tvimp3=tv
(3 missing values generated)

Interpolate for 1 missing value between two known values

. bysort kidid (dt): replace tvimp3 = (tv[_n-1]+tv[_n+1])/2 if missing(tv)
(1 real change made)

. list kidid tv tvimp3, sepby(kidid)

     ┌─────────────────────┐
     │ kidid   tv   tvimp3 │
     ├─────────────────────┤
  1. │     1    1        1 │
  2. │     1    3        3 │
     ├─────────────────────┤
  3. │     2    8        8 │
     ├─────────────────────┤
  4. │     3    2        2 │
  5. │     3    .        . │
  6. │     3    .        . │
  7. │     3    6        6 │
     ├─────────────────────┤
  8. │     4    7        7 │
  9. │     4    .      5.5 │
 10. │     4    4        4 │
     └─────────────────────┘

This is a bit of hard coding, but you can interpolate with more than 1 missing

. bysort kidid (dt): replace tvimp3 = ((tvimp3[4]-tvimp3[1])/3)+tvimp3[_n-1] if missing(tvimp3) & kidid==3
(2 real changes made)

. list kidid tv tvimp3, sepby(kidid)

     ┌────────────────────────┐
     │ kidid   tv      tvimp3 │
     ├────────────────────────┤
  1. │     1    1           1 │
  2. │     1    3           3 │
     ├────────────────────────┤
  3. │     2    8           8 │
     ├────────────────────────┤
  4. │     3    2           2 │
  5. │     3    .   3.3333333 │
  6. │     3    .   4.6666667 │
  7. │     3    6           6 │
     ├────────────────────────┤
  8. │     4    7           7 │
  9. │     4    .         5.5 │
 10. │     4    4           4 │
     └────────────────────────┘

Indicators

What is we want to find outliers in time-varying differences? We can generate indicators variables to find when a variable changes more than a set limit. For example we want to know if the tv viewing habits drop more than 2 hours

. use tv1, clear

. bysort kidid (dt): generate tvchange = tv[_n]-tv[_n-1]
(4 missing values generated)

. bysort kidid (dt): generate tvchangerate = ((tv[_n]-tv[_n-1])/tv[_n-1])
(4 missing values generated)

. list, sepby(kidid)

     ┌────────────────────────────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac   tvchange   tvchange~e │
     ├────────────────────────────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1          .            . │
  2. │     1   08jan2002        1   55    3     1          2            2 │
     ├────────────────────────────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1          .            . │
     ├────────────────────────────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0          .            . │
  5. │     3   19jan2002        0   63    5     1          3          1.5 │
  6. │     3   21jan2002        0   66    1     1         -4          -.8 │
  7. │     3   22jan2002        0   64    6     0          5            5 │
     ├────────────────────────────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0          .            . │
  9. │     4   11jan2002        1   58    1     0         -6   -.85714286 │
 10. │     4   13jan2002        1   55    4     0          3            3 │
     └────────────────────────────────────────────────────────────────────┘

Generate an indicator variable to see if tvchange is less than -2. This is not very helpful with small datasets, but with larger datasets such as the CPS It is important

. gen tvchangeid=(tvchange <= -2) if !missing(tvchange)
(4 missing values generated)

. list, sepby(kidid)

     ┌───────────────────────────────────────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac   tvchange   tvchange~e   tvchan~d │
     ├───────────────────────────────────────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1          .            .          . │
  2. │     1   08jan2002        1   55    3     1          2            2          0 │
     ├───────────────────────────────────────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1          .            .          . │
     ├───────────────────────────────────────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0          .            .          . │
  5. │     3   19jan2002        0   63    5     1          3          1.5          0 │
  6. │     3   21jan2002        0   66    1     1         -4          -.8          1 │
  7. │     3   22jan2002        0   64    6     0          5            5          0 │
     ├───────────────────────────────────────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0          .            .          . │
  9. │     4   11jan2002        1   58    1     0         -6   -.85714286          1 │
 10. │     4   13jan2002        1   55    4     0          3            3          0 │
     └───────────────────────────────────────────────────────────────────────────────┘

8.8 Comparing the by, tsset, xtset commands

Another way to find differences within vectors, we can use the tsset or xtset command to establish the times series (tsset) or panel data (xtset). We can use our bysort with subscripting/indexing.

. use tv1, clear

. bysort kidid (dt): generate ltv = tv[_n-1]
(4 missing values generated)

. list, sepby(kidid)

     ┌──────────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac   ltv │
     ├──────────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1     . │
  2. │     1   08jan2002        1   55    3     1     1 │
     ├──────────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1     . │
     ├──────────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0     . │
  5. │     3   19jan2002        0   63    5     1     2 │
  6. │     3   21jan2002        0   66    1     1     5 │
  7. │     3   22jan2002        0   64    6     0     1 │
     ├──────────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0     . │
  9. │     4   11jan2002        1   58    1     0     7 │
 10. │     4   13jan2002        1   55    4     0     1 │
     └──────────────────────────────────────────────────┘

Or, we can establish a time series (tsset)

. use tv1, clear

. sort kidid dt

We’ll need to specify that our cross-sectional groups is kidid We’ll need to specify our date variable with dt We’ll need to use the option, daily, to specify that time period is daily as opposed to weeks, months, years. Or we can specify delta(1) for one day

. tsset kidid dt, daily delta(1)
       panel variable:  kidid (unbalanced)
        time variable:  dt, 07jan2002 to 22jan2002, but with gaps
                delta:  1 day

We can use the operator L.var to specify that we want to a lag of 1 day

. generate lagtv = L.tv
(6 missing values generated)

We can use the operator F.var to specify that we want a lead of 1 day

. generate leadtv = F.tv
(6 missing values generated)

. list, sepby(kidid)

     ┌─────────────────────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac   lagtv   leadtv │
     ├─────────────────────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1       .        3 │
  2. │     1   08jan2002        1   55    3     1       1        . │
     ├─────────────────────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1       .        . │
     ├─────────────────────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0       .        5 │
  5. │     3   19jan2002        0   63    5     1       2        . │
  6. │     3   21jan2002        0   66    1     1       .        6 │
  7. │     3   22jan2002        0   64    6     0       1        . │
     ├─────────────────────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0       .        1 │
  9. │     4   11jan2002        1   58    1     0       7        . │
 10. │     4   13jan2002        1   55    4     0       .        . │
     └─────────────────────────────────────────────────────────────┘

Or, we can establish a panel series (xtset)

. use tv1, clear

. sort kidid dt

We’ll need to specify that our cross-sectional group is kidid We’ll need to specify our time period is dt We’ll use a delta of 1 to specify that the differnce is 1 day. Or, we can use daily as well

. xtset kidid dt, daily delta(1)
       panel variable:  kidid (unbalanced)
        time variable:  dt, 07jan2002 to 22jan2002, but with gaps
                delta:  1 day

Generate a lag with the l.var operator

. generate lagtv = l.tv
(6 missing values generated)

Generate a lead with the

. generate leadtv = f.tv
(6 missing values generated)

What do you notice? You can see that there are some leads and lags missing. Why? Because there is an unbalance panel and the daily differences cannot be computed if we are missing days. In this case, we can use the bysort with subscripting indexing

. list, sepby(kidid)

     ┌─────────────────────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac   lagtv   leadtv │
     ├─────────────────────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1       .        3 │
  2. │     1   08jan2002        1   55    3     1       1        . │
     ├─────────────────────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1       .        . │
     ├─────────────────────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0       .        5 │
  5. │     3   19jan2002        0   63    5     1       2        . │
  6. │     3   21jan2002        0   66    1     1       .        6 │
  7. │     3   22jan2002        0   64    6     0       1        . │
     ├─────────────────────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0       .        1 │
  9. │     4   11jan2002        1   58    1     0       7        . │
 10. │     4   13jan2002        1   55    4     0       .        . │
     └─────────────────────────────────────────────────────────────┘

. bysort kidid (dt): gen bylagtv=tv[_n-1]
(4 missing values generated)

. bysort kidid (dt): gen byleadtv=tv[_n+1]
(4 missing values generated)

. list, sepby(kidid)

     ┌──────────────────────────────────────────────────────────────────────────────────┐
     │ kidid          dt   female   wt   tv   vac   lagtv   leadtv   bylagtv   byleadtv │
     ├──────────────────────────────────────────────────────────────────────────────────┤
  1. │     1   07jan2002        1   53    1     1       .        3         .          3 │
  2. │     1   08jan2002        1   55    3     1       1        .         1          . │
     ├──────────────────────────────────────────────────────────────────────────────────┤
  3. │     2   16jan2002        1   58    8     1       .        .         .          . │
     ├──────────────────────────────────────────────────────────────────────────────────┤
  4. │     3   18jan2002        0   60    2     0       .        5         .          5 │
  5. │     3   19jan2002        0   63    5     1       2        .         2          1 │
  6. │     3   21jan2002        0   66    1     1       .        6         5          6 │
  7. │     3   22jan2002        0   64    6     0       1        .         1          . │
     ├──────────────────────────────────────────────────────────────────────────────────┤
  8. │     4   10jan2002        1   62    7     0       .        1         .          1 │
  9. │     4   11jan2002        1   58    1     0       7        .         7          4 │
 10. │     4   13jan2002        1   55    4     0       .        .         1          . │
     └──────────────────────────────────────────────────────────────────────────────────┘

What is the difference?

From Nick Cox: xtset allows a panel identifier only. tsset allows a time identifier only. Where they overlap is when two variables are supplied in which case the first is treated as a panel identifier and the second as a time variable.

Exercises

Let’s grab the CPS and generate subgroup analysis We will be using unweighted data for simplicity What are the average wages by sex? Whage are average wages by state Median wages by sex Median wages by state What is the 75th percentile of wages by race? What is the 25th percentile of wages by marital status?

. use "/Users/Sam/Desktop/Econ 645/Data/CPS/jun23pub.dta",replace

Use bysort state: egen totalvar=total(var1)