Chapter 3 Negative Binomial Regression Model

Lesson: When overdispersion occurs, we could estimate a Negative Binomial Regression model. We’ll see that our standard errors might be too large if our main Poisson assumption mean=variance fails.

Poisson

First, we will estimate a poisson regression.

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "crime1.dta", clear
est clear
eststo PRM: poisson narr86 pcnv avgsen tottime ptime86 qemp86 inc86 black hispan born60 

Negative Binomial

We use the nbreg command to estimate a negative binomial regression.

eststo NBRM: nbreg narr86 pcnv avgsen tottime ptime86 qemp86 inc86 black hispan born60 
Fitting Poisson model:

Iteration 0:   log likelihood = -2249.0104  
Iteration 1:   log likelihood = -2248.7614  
Iteration 2:   log likelihood = -2248.7611  
Iteration 3:   log likelihood = -2248.7611  

Fitting constant-only model:

Iteration 0:   log likelihood = -2297.3847  
Iteration 1:   log likelihood = -2291.4767  
Iteration 2:   log likelihood = -2290.6926  
Iteration 3:   log likelihood = -2290.6903  
Iteration 4:   log likelihood = -2290.6903  

Fitting full model:

Iteration 0:   log likelihood = -2181.2355  
Iteration 1:   log likelihood = -2158.2716  
Iteration 2:   log likelihood = -2157.6284  
Iteration 3:   log likelihood =  -2157.628  
Iteration 4:   log likelihood =  -2157.628  

Negative binomial regression                    Number of obs     =      2,725
                                                LR chi2(9)        =     266.12
Dispersion     = mean                           Prob > chi2       =     0.0000
Log likelihood =  -2157.628                     Pseudo R2         =     0.0581

------------------------------------------------------------------------------
      narr86 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        pcnv |  -.4770963   .1033295    -4.62   0.000    -.6796183   -.2745743
      avgsen |  -.0173385   .0261171    -0.66   0.507    -.0685272    .0338501
     tottime |   .0197394   .0192325     1.03   0.305    -.0179557    .0574344
     ptime86 |  -.1073997    .025074    -4.28   0.000    -.1565439   -.0582555
      qemp86 |  -.0504884   .0351857    -1.43   0.151    -.1194511    .0184743
       inc86 |  -.0077126   .0011465    -6.73   0.000    -.0099596   -.0054656
       black |   .6560406   .0923594     7.10   0.000     .4750195    .8370617
      hispan |   .5048465   .0895663     5.64   0.000     .3292998    .6803932
      born60 |   -.046412   .0776384    -0.60   0.550    -.1985804    .1057564
       _cons |  -.5637368   .0827121    -6.82   0.000    -.7258495   -.4016242
-------------+----------------------------------------------------------------
    /lnalpha |  -.0738912   .1177617                     -.3046999    .1569175
-------------+----------------------------------------------------------------
       alpha |   .9287728   .1093739                      .7373446    1.169899
------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 182.27               Prob >= chibar2 = 0.000

First, our coefficients are the change in expected log counts, so we need some transformations to provide interpretations.

Second, our test for over dispersion can be seen at the bottom of our NBRM output with the LR test of \(\alpha = 0\)

Factor Change

We can manually calculate \(e^{\beta}\) to interpret the results with a factor change, or we can use the listcoef command:

listcoef, percent help
listcoef, help
nbreg (N=2725): Percentage Change in Expected Count 

 Observed SD: .85907678

------------------------------------------------------------------
  narr86 |      b         z     P>|z|      %      %StdX      SDofX
---------+--------------------------------------------------------
    pcnv |  -0.47710   -4.617   0.000    -37.9    -17.2     0.3952
  avgsen |  -0.01734   -0.664   0.507     -1.7     -5.9     3.5080
 tottime |   0.01974    1.026   0.305      2.0      9.5     4.6070
 ptime86 |  -0.10740   -4.283   0.000    -10.2    -18.9     1.9501
  qemp86 |  -0.05049   -1.435   0.151     -4.9     -7.8     1.6104
   inc86 |  -0.00771   -6.727   0.000     -0.8    -40.2    66.6272
   black |   0.65604    7.103   0.000     92.7     27.3     0.3677
  hispan |   0.50485    5.637   0.000     65.7     23.2     0.4127
  born60 |  -0.04641   -0.598   0.550     -4.5     -2.2     0.4808
---------+--------------------------------------------------------
ln alpha |  -0.07389   -0.627
------------------------------------------------------------------
       b = raw coefficient
       z = z-score for test of b=0
   P>|z| = p-value for z-test
       % = percent change in expected count for unit increase in X
   %StdX = percent change in expected count for SD increase in X
   SDofX = standard deviation of X


nbreg (N=2725): Factor Change in Expected Count 

 Observed SD: .85907678

------------------------------------------------------------------
  narr86 |      b         z     P>|z|    e^b    e^bStdX      SDofX
---------+--------------------------------------------------------
    pcnv |  -0.47710   -4.617   0.000   0.6206   0.8282     0.3952
  avgsen |  -0.01734   -0.664   0.507   0.9828   0.9410     3.5080
 tottime |   0.01974    1.026   0.305   1.0199   1.0952     4.6070
 ptime86 |  -0.10740   -4.283   0.000   0.8982   0.8110     1.9501
  qemp86 |  -0.05049   -1.435   0.151   0.9508   0.9219     1.6104
   inc86 |  -0.00771   -6.727   0.000   0.9923   0.5982    66.6272
   black |   0.65604    7.103   0.000   1.9271   1.2728     0.3677
  hispan |   0.50485    5.637   0.000   1.6567   1.2316     0.4127
  born60 |  -0.04641   -0.598   0.550   0.9546   0.9779     0.4808
---------+--------------------------------------------------------
ln alpha |  -0.07389   -0.627
------------------------------------------------------------------
       b = raw coefficient
       z = z-score for test of b=0
   P>|z| = p-value for z-test
     e^b = exp(b) = factor change in expected count for unit increase in X
 e^bStdX = exp(b*SD of X) = change in expected count for SD increase in X
   SDofX = standard deviation of X

Let’s look at pcnv:

display exp(-.4771*.1)-1
-.04658976

A 10 percentage point increase in proportion of prior arrests that lead to conviction decrease the expected number of arrests by 4.7% holding all others constant.

Let’s look at avgsen:

display (exp(0.01974)-1)
.01993612

a 1 month increase in total time served in prision from prior convictions before 1986 increases the expected number of arrests holding all others constant.

esttab, mtitle
                      (1)             (2)   
                      PRM            NBRM   
--------------------------------------------
narr86                                      
pcnv               -0.402***       -0.477***
                  (-4.73)         (-4.62)   

avgsen            -0.0238         -0.0173   
                  (-1.19)         (-0.66)   

tottime            0.0245          0.0197   
                   (1.66)          (1.03)   

ptime86           -0.0986***       -0.107***
                  (-4.76)         (-4.28)   

qemp86            -0.0380         -0.0505   
                  (-1.31)         (-1.43)   

inc86            -0.00808***     -0.00771***
                  (-7.76)         (-6.73)   

black               0.661***        0.656***
                   (8.95)          (7.10)   

hispan              0.500***        0.505***
                   (6.76)          (5.64)   

born60            -0.0510         -0.0464   
                  (-0.80)         (-0.60)   

_cons              -0.600***       -0.564***
                  (-8.92)         (-6.82)   
--------------------------------------------
lnalpha                                     
_cons                             -0.0739   
                                  (-0.63)   
--------------------------------------------
N                    2725            2725   
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Marginal Effects

Interpretation is similar to Poisson The marginal change in a continuous variable depends on the expected value of y given x, so we have to specify x with atmeans or average marginal effects

margins, dydx(*)
Average marginal effects                        Number of obs     =      2,725
Model VCE    : OIM

Expression   : Predicted number of events, predict()
dy/dx w.r.t. : pcnv avgsen tottime ptime86 qemp86 inc86 black hispan born60

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        pcnv |  -.1933482   .0429858    -4.50   0.000    -.2775988   -.1090976
      avgsen |  -.0070266    .010587    -0.66   0.507    -.0277767    .0137235
     tottime |   .0079996   .0078006     1.03   0.305    -.0072894    .0232886
     ptime86 |  -.0435248   .0103794    -4.19   0.000    -.0638682   -.0231815
      qemp86 |  -.0204609   .0143136    -1.43   0.153    -.0485151    .0075932
       inc86 |  -.0031256   .0004821    -6.48   0.000    -.0040705   -.0021807
       black |   .2658672   .0395274     6.73   0.000     .1883949    .3433395
      hispan |   .2045942   .0375363     5.45   0.000     .1310244    .2781641
      born60 |  -.0188089   .0314753    -0.60   0.550    -.0804994    .0428815
------------------------------------------------------------------------------

Incident Rate Ratios

We can also use the option irr to calcualte incident-rate ratios.

nbreg narr86 pcnv avgsen tottime ptime86 qemp86 inc86 black hispan born60, irr
Fitting Poisson model:

Iteration 0:   log likelihood = -2249.0104  
Iteration 1:   log likelihood = -2248.7614  
Iteration 2:   log likelihood = -2248.7611  
Iteration 3:   log likelihood = -2248.7611  

Fitting constant-only model:

Iteration 0:   log likelihood = -2297.3847  
Iteration 1:   log likelihood = -2291.4767  
Iteration 2:   log likelihood = -2290.6926  
Iteration 3:   log likelihood = -2290.6903  
Iteration 4:   log likelihood = -2290.6903  

Fitting full model:

Iteration 0:   log likelihood = -2181.2355  
Iteration 1:   log likelihood = -2158.2716  
Iteration 2:   log likelihood = -2157.6284  
Iteration 3:   log likelihood =  -2157.628  
Iteration 4:   log likelihood =  -2157.628  

Negative binomial regression                    Number of obs     =      2,725
                                                LR chi2(9)        =     266.12
Dispersion     = mean                           Prob > chi2       =     0.0000
Log likelihood =  -2157.628                     Pseudo R2         =     0.0581

------------------------------------------------------------------------------
      narr86 |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        pcnv |   .6205828   .0641245    -4.62   0.000     .5068104    .7598956
      avgsen |   .9828109   .0256682    -0.66   0.507     .9337681     1.03443
     tottime |   1.019935   .0196159     1.03   0.305     .9822046    1.059116
     ptime86 |   .8981666   .0225207    -4.28   0.000     .8550939    .9434089
      qemp86 |    .950765   .0334533    -1.43   0.151     .8874074    1.018646
       inc86 |   .9923171   .0011377    -6.73   0.000     .9900898    .9945493
       black |   1.927147   .1779901     7.10   0.000     1.608046    2.309571
      hispan |   1.656731   .1483873     5.64   0.000     1.389994    1.974654
      born60 |   .9546486   .0741174    -0.60   0.550     .8198939    1.111551
       _cons |   .5690785   .0470697    -6.82   0.000     .4839133    .6692322
-------------+----------------------------------------------------------------
    /lnalpha |  -.0738912   .1177617                     -.3046999    .1569175
-------------+----------------------------------------------------------------
       alpha |   .9287728   .1093739                      .7373446    1.169899
------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 182.27               Prob >= chibar2 = 0.000