Chapter 2 Panel Data Part 1 - Pooled Data and First Difference

We will cover the implementation of two estimators today: 1. Pooled OLS Estimator 2. First Difference Estimator

2.1 Pooled OLS Estimator

We can account for \(a_t\) with our Pooled OLS estimator.

2.1.1 Fertility over time

Lesson: Time binaries can capture secular trends over time or trends that affect all people during the time period.

Sander (1992) uses the National Opinion Research Center’s General Social Survey for the veven users from 1972 to 1984. We’ll use these data to explain the total number of kids born to a women. Let’s get our data.

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "fertil1.dta", clear

One question of interest: What happens to ferlity rates over time after controlling for observable factors? We’ll control for education, age, race, region at 16 years old, and living environment at 16.

We’ll use 1972 as our base year (which we will exclude and be captured in the intercept).

\[ \begin{aligned} kids_{i,t} = \beta_0 + \beta_1 educ_{i,t} + \beta_2 age_{i,t} + \beta_3 age^2_{i,t} + \beta_4 Black_{i} + \beta_5 East_{i,t} + \beta_6 NorthCentral_{i,t} \\ + \beta_7 West_{i,t} + \beta_8 Farm_{i,t} + \beta_9 OtherRural_{i,t} + \beta_{10} Town_{i,t} + \beta_{11} SmallCity_{i,t} + a_{t} \end{aligned} \]

First, we’ll estimate the regression without Time Trends

est clear
eststo OLS: reg kids educ c.age##c.age i.black i.east i.northcen i.west i.farm i.othrural i.town i.smcity 
      Source |       SS           df       MS      Number of obs   =     1,129
-------------+----------------------------------   F(11, 1117)     =     11.52
       Model |  314.471892        11  28.5883539   Prob > F        =    0.0000
    Residual |  2771.03741     1,117   2.4807855   R-squared       =    0.1019
-------------+----------------------------------   Adj R-squared   =    0.0931
       Total |   3085.5093     1,128  2.73538059   Root MSE        =    1.5751

------------------------------------------------------------------------------
        kids |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |  -.1428788    .018351    -7.79   0.000    -.1788851   -.1068725
         age |   .5624223   .1396257     4.03   0.000     .2884641    .8363804
             |
 c.age#c.age |  -.0060917   .0015793    -3.86   0.000    -.0091903    -.002993
             |
     1.black |    .977559    .173188     5.64   0.000     .6377485     1.31737
      1.east |   .2362931   .1340365     1.76   0.078    -.0266987    .4992849
  1.northcen |   .3847487   .1222117     3.15   0.002     .1449583    .6245391
      1.west |   .2447027   .1686052     1.45   0.147    -.0861158    .5755212
      1.farm |   -.054186   .1486156    -0.36   0.715    -.3457832    .2374112
  1.othrural |  -.1670751   .1773583    -0.94   0.346    -.5150681    .1809178
      1.town |   .0842369   .1257038     0.67   0.503    -.1624053    .3308792
    1.smcity |   .1830768   .1620166     1.13   0.259    -.1348143     .500968
       _cons |  -8.487543   3.068381    -2.77   0.006    -14.50798   -2.467104
------------------------------------------------------------------------------

Next, we account for time trends by including Time Fixed Effects

eststo Pooled_OLS: reg kids educ c.age##c.age i.black i.east i.northcen i.west i.farm i.othrural i.town i.smcity i.y74 i.y76 i.y78 i.y80 i.y82 i.y84
      Source |       SS           df       MS      Number of obs   =     1,129
-------------+----------------------------------   F(17, 1111)     =      9.72
       Model |  399.610888        17  23.5065228   Prob > F        =    0.0000
    Residual |  2685.89841     1,111  2.41755033   R-squared       =    0.1295
-------------+----------------------------------   Adj R-squared   =    0.1162
       Total |   3085.5093     1,128  2.73538059   Root MSE        =    1.5548

------------------------------------------------------------------------------
        kids |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |  -.1284268   .0183486    -7.00   0.000    -.1644286    -.092425
         age |   .5321346   .1383863     3.85   0.000     .2606065    .8036626
             |
 c.age#c.age |   -.005804   .0015643    -3.71   0.000    -.0088733   -.0027347
             |
     1.black |   1.075658   .1735356     6.20   0.000     .7351631    1.416152
      1.east |    .217324   .1327878     1.64   0.102    -.0432192    .4778672
  1.northcen |    .363114   .1208969     3.00   0.003      .125902    .6003261
      1.west |   .1976032   .1669134     1.18   0.237    -.1298978    .5251041
      1.farm |  -.0525575     .14719    -0.36   0.721    -.3413592    .2362443
  1.othrural |  -.1628537    .175442    -0.93   0.353    -.5070887    .1813814
      1.town |   .0843532    .124531     0.68   0.498    -.1599893    .3286957
    1.smcity |   .2118791    .160296     1.32   0.187    -.1026379    .5263961
       1.y74 |   .2681825    .172716     1.55   0.121    -.0707039    .6070689
       1.y76 |  -.0973795   .1790456    -0.54   0.587     -.448685    .2539261
       1.y78 |  -.0686665   .1816837    -0.38   0.706    -.4251483    .2878154
       1.y80 |  -.0713053   .1827707    -0.39   0.697      -.42992    .2873093
       1.y82 |  -.5224842   .1724361    -3.03   0.003    -.8608214    -.184147
       1.y84 |  -.5451661   .1745162    -3.12   0.002    -.8875846   -.2027477
       _cons |  -7.742457   3.051767    -2.54   0.011    -13.73033   -1.754579
------------------------------------------------------------------------------

Exclusion Restriction Test

test 1.y74 1.y76 1.y78 1.y80 1.y82 1.y84
 ( 1)  1.y74 = 0
 ( 2)  1.y76 = 0
 ( 3)  1.y78 = 0
 ( 4)  1.y80 = 0
 ( 5)  1.y82 = 0
 ( 6)  1.y84 = 0

       F(  6,  1111) =    5.87
            Prob > F =    0.0000

We reject the null hypothesis that our combined time binaries are 0 with an F-stat (1,111 degrees of freedom) equal to 5.87.

Display

esttab OLS Pooled_OLS, mtitles se scalars(F r2) drop(0.black 0.east 0.northcen 0.west 0.farm 0.othrural 0.town 0.smcity 0.y74 0.y76 0.y78 0.y80 0.y82 0.y84) 
                      (1)             (2)   
                      OLS      Pooled_OLS   
--------------------------------------------
educ               -0.143***       -0.128***
                 (0.0184)        (0.0183)   

age                 0.562***        0.532***
                  (0.140)         (0.138)   

c.age#c.age      -0.00609***     -0.00580***
                (0.00158)       (0.00156)   

1.black             0.978***        1.076***
                  (0.173)         (0.174)   

1.east              0.236           0.217   
                  (0.134)         (0.133)   

1.northcen          0.385**         0.363** 
                  (0.122)         (0.121)   

1.west              0.245           0.198   
                  (0.169)         (0.167)   

1.farm            -0.0542         -0.0526   
                  (0.149)         (0.147)   

1.othrural         -0.167          -0.163   
                  (0.177)         (0.175)   

1.town             0.0842          0.0844   
                  (0.126)         (0.125)   

1.smcity            0.183           0.212   
                  (0.162)         (0.160)   

1.y74                               0.268   
                                  (0.173)   

1.y76                             -0.0974   
                                  (0.179)   

1.y78                             -0.0687   
                                  (0.182)   

1.y80                             -0.0713   
                                  (0.183)   

1.y82                              -0.522** 
                                  (0.172)   

1.y84                              -0.545** 
                                  (0.175)   

_cons              -8.488**        -7.742*  
                  (3.068)         (3.052)   
--------------------------------------------
N                    1129            1129   
F                   11.52           9.723   
r2                  0.102           0.130   
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

We can see most our time coefficient are not statistically significant but we do reject the null hypothesis that they are all equal to 0. If we compare 1982 and 1984 to 1972, women estimated to have about 0.52 fewer and 0.55 fewer children, respectively. This means that in 1982 100 women are expected to have 52 fewer children than 100 women in 1972.

Note: Since we are controlling for education, this decline in fertility cannot be attributed to increases in education. The difference is a secular trend or a general trend that transcends across all women.

Note: We have age and age-squared, which means that the number of children increases with age, but at a diminishing rate. At one point age will become 0.

estat hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
         Ho: Constant variance
         Variables: fitted values of kids

         chi2(1)      =    26.28
         Prob > chi2  =   0.0000

Note that we reject the null hypothesis of homoskedasticity with a Breusch-Pagan Test, so we should use our heteroskedastic-robust standard errors

est clear
eststo Without_Robust: quietly reg kids educ c.age##c.age i.black i.east i.northcen i.west i.farm i.othrural i.town i.smcity i.y74 i.y76 i.y78 i.y80 i.y82 i.y84
eststo With_Robust: quietly reg kids educ c.age##c.age i.black i.east i.northcen i.west i.farm i.othrural i.town i.smcity i.y74 i.y76 i.y78 i.y80 i.y82 i.y84, robust
esttab Without_Robust With_Robust, mtitles se scalars(F r2) drop(0.black 0.east 0.northcen 0.west 0.farm 0.othrural 0.town 0.smcity 0.y74 0.y76 0.y78 0.y80 0.y82 0.y84) 
                      (1)             (2)   
             Without_Ro~t     With_Robust   
--------------------------------------------
educ               -0.128***       -0.128***
                 (0.0183)        (0.0211)   

age                 0.532***        0.532***
                  (0.138)         (0.139)   

c.age#c.age      -0.00580***     -0.00580***
                (0.00156)       (0.00158)   

1.black             1.076***        1.076***
                  (0.174)         (0.201)   

1.east              0.217           0.217   
                  (0.133)         (0.127)   

1.northcen          0.363**         0.363** 
                  (0.121)         (0.117)   

1.west              0.198           0.198   
                  (0.167)         (0.163)   

1.farm            -0.0526         -0.0526   
                  (0.147)         (0.146)   

1.othrural         -0.163          -0.163   
                  (0.175)         (0.181)   

1.town             0.0844          0.0844   
                  (0.125)         (0.128)   

1.smcity            0.212           0.212   
                  (0.160)         (0.154)   

1.y74               0.268           0.268   
                  (0.173)         (0.188)   

1.y76             -0.0974         -0.0974   
                  (0.179)         (0.200)   

1.y78             -0.0687         -0.0687   
                  (0.182)         (0.198)   

1.y80             -0.0713         -0.0713   
                  (0.183)         (0.194)   

1.y82              -0.522**        -0.522** 
                  (0.172)         (0.188)   

1.y84              -0.545**        -0.545** 
                  (0.175)         (0.186)   

_cons              -7.742*         -7.742*  
                  (3.052)         (3.071)   
--------------------------------------------
N                    1129            1129   
F                   9.723           10.19   
r2                  0.130           0.130   
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Question: do you think the number of kids is distributed normally?

2.1.2 Changes in the Returns to Education and the Sex Wage Gap

Lesson: Interactions matter. We can manually create interactions, but that can be cumbersome. Using the # operator, we can generate interactions within our regression command.

Stata interactions: i.var1##i.var2, i.var1##c.var3, c.var3##c.var4

We can estimate to see how the wage gap and returns to schooling have compared for women. We pool data from 1978 and 1985 from the Current Population Survey.

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "cps78_85.dta", clear

What we will do to estimate the wage gap and the returns to education between 1978 and 1985. We can do this by interacting our binary variable y85 with our continuous variable of education. Our coefficient on edu will be the returns to education in 1978 and the return to education in 1985 will be captured in the coefficient for edu#1.y85 plus the coefficient for edu in 1978. The coefficient for edu#1.y85 will be relative to the coefficient on edu, so we can add the two coefficients to find the returns to education in 1985.

We also interact our binary variable y85 with another binary variable female to see the wage gap in 1978 and 1985. 1.female will be the wage gap in 1978 and 1.female#1.y85 plus the 1.female will be our wage gap for females in 1985.

Wages will have changed between 1978 and 1985 due to inflation, since wages in the CPS are in nominal dollars. We can deflate our wages with a price deflator, or we can use natural log of wages with time binaries to capture inflation that is associated with the year binaries.

Natural Log and accounting for inflation with time variables. Let’s say our deflator for 1985 is 1.65, so we need to divide wages by the deflator to get real wages in 1978 dollars. But if we take the natural log transformed wages using real or nominal wages does not matter since the inflation factor will be absorbed in the time binary y1985:

\[ ln(wage_i/P85)=ln(wages_i)-ln(p85) \]

While wages may differ across individuals in 1985, all individuals in 1985 are affected by the secular trend of inflation relevant to 1985. (Note: if the price trends varied by region, then we would need to account for this since prices would not be a secular trend affecting all people).

We need both ln(wages) and time binaries, otherwise our estimates would be biased.

Note: that this remains true whether the nominal value is in the dependentor independent variables as long as it is natural log transformed and has time binaries.

\[ ln(wage_{i,t}) = \beta_0 + \beta_1 edu_{i,t} + \beta_2 exp_{i,t} + \beta_3 exp^2_{i,t} + \beta_4 union_{i,t} + \beta_5 female_{i} + \\ \beta_6 edu_{i,t}*Y85_t + \beta_7 female_i*Y85_{t} +\beta_8 Y85_{t} + \varepsilon_{i,t} \]

reg lwage i.y85##c.edu c.exper##c.exper i.union i.female##i.y85
      Source |       SS           df       MS      Number of obs   =     1,084
-------------+----------------------------------   F(8, 1075)      =     99.80
       Model |  135.992074         8  16.9990092   Prob > F        =    0.0000
    Residual |  183.099094     1,075  .170324738   R-squared       =    0.4262
-------------+----------------------------------   Adj R-squared   =    0.4219
       Total |  319.091167     1,083   .29463635   Root MSE        =     .4127

---------------------------------------------------------------------------------
          lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
          1.y85 |   .1178062   .1237817     0.95   0.341     -.125075    .3606874
           educ |   .0747209   .0066764    11.19   0.000     .0616206    .0878212
                |
     y85#c.educ |
             1  |   .0184605   .0093542     1.97   0.049      .000106     .036815
                |
          exper |   .0295843   .0035673     8.29   0.000     .0225846     .036584
                |
c.exper#c.exper |  -.0003994   .0000775    -5.15   0.000    -.0005516   -.0002473
                |
        1.union |   .2021319   .0302945     6.67   0.000     .1426888    .2615749
       1.female |  -.3167086   .0366215    -8.65   0.000    -.3885663    -.244851
                |
     female#y85 |
           1 1  |    .085052    .051309     1.66   0.098    -.0156251     .185729
                |
          _cons |   .4589329   .0934485     4.91   0.000     .2755707     .642295
---------------------------------------------------------------------------------

Our results:

Wage gap in 1978 is \((e^{(-.317)}-1)*100 = -27.2 \%\)

Wage gap in 1985 is \((e^{(-.317+.085)}-1)*100 = -20.7 \%\)

display (exp(-.317)-1)*100
display (exp(-.317+0.085)-1)*100
-27.166924

-20.705388

Difference in returns to education

display (exp(0.0747)-1)*100
display (exp(0.0747+0.0185)-1)*100
7.7560834

9.768125

Declines in wage gap since 1978

display (exp(0.085)-1)*100
8.8717067

Increases in returns to education since 1978

display (exp(0.0185)-1)*100
1.8672185

Chow Test/F-Test for structural changes across time (nested-model). We will test whether or not to include time binaries.

test 1.y85 0.y85 1.y85#edu 1.y85#1.female
 ( 1)  1.y85 = 0
 ( 2)  0b.y85 = 0
 ( 3)  1.y85#c.educ = 0
 ( 4)  1.female#1.y85 = 0
       Constraint 2 dropped

       F(  3,  1075) =   80.26
            Prob > F =    0.0000

Test slopes

test 1.y85#edu 1.y85#1.female
 ( 1)  1.y85#c.educ = 0
 ( 2)  1.female#1.y85 = 0

       F(  2,  1075) =    3.43
            Prob > F =    0.0326

Test intercepts

test 1.y85 0.y85
 ( 1)  1.y85 = 0
 ( 2)  0b.y85 = 0
       Constraint 2 dropped

       F(  1,  1075) =    0.91
            Prob > F =    0.3415

Should we keep our time binaries? Our F-tests show that we should and we’ll need them for to account for inflation.

2.2 First Difference Estimator

We can account for \(a_t\) and \(a_i\) with our First Difference estimator.

2.2.1 Sleep vs Working

Lesson: The way you set up panel data matters. We need to use the long-format.

I think this is a good example of a poorly set up panel data. We’ll use the file called slp75_81.dta to took at a two year panel data from Biddle and Hamermesh (1990) to estimate the tradeoff between sleeping and working. We have data for 239 people between 1975 and 1981 for the same person in both periods.

We’ll get to reshape in Mitchell soon, but here is a preview. We will reshape the data so that each cross-sectional unit of observation has 2 rows: one for 1975 and one for 1985.

Let’s take a look at a wide-form dataset that needs to be reshaped

Data in wide form
Data in wide form

Reshape

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "slp75_81.dta", clear
gen id=_n
gen age81=age75+6
reshape long age educ gdhlth marr slpnap totwrk yngkid, i(id) j(year)
replace year=year+1900
/Users/Sam/Desktop/Econ 645/Data/Wooldridge




(note: j = 75 81)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                      239   ->     478
Number of variables                  22   ->      16
j variable (2 values)                     ->   year
xij variables:
                            age75 age81   ->   age
                          educ75 educ81   ->   educ
                      gdhlth75 gdhlth81   ->   gdhlth
                          marr75 marr81   ->   marr
                      slpnap75 slpnap81   ->   slpnap
                      totwrk75 totwrk81   ->   totwrk
                      yngkid75 yngkid81   ->   yngkid
-----------------------------------------------------------------------------

variable year was byte now int
(478 real changes made)

We’ll generate our time binaries

gen d75 = .
replace d75=0 if year==1981
replace d75=1 if year==1975

gen d81=.
replace d81=0 if year==1975
replace d81=1 if year==1981
(478 missing values generated)

(239 real changes made)

(239 real changes made)

(478 missing values generated)

(239 real changes made)

(239 real changes made)

And we’ll want to check that we don’t have any overlapping periods Check

tab year d75
tab year d81
tab d81 d75
           |          d75
      year |         0          1 |     Total
-----------+----------------------+----------
      1975 |         0        239 |       239 
      1981 |       239          0 |       239 
-----------+----------------------+----------
     Total |       239        239 |       478 

           |          d81
      year |         0          1 |     Total
-----------+----------------------+----------
      1975 |       239          0 |       239 
      1981 |         0        239 |       239 
-----------+----------------------+----------
     Total |       239        239 |       478 

           |          d75
       d81 |         0          1 |     Total
-----------+----------------------+----------
         0 |         0        239 |       239 
         1 |       239          0 |       239 
-----------+----------------------+----------
     Total |       239        239 |       478 

Now we can work with the panel. Let xtset know that there is a 6 year gap in the data. If not, it will try to take a 1 year change and 1976 does not exist.

xtset id year, delta(6)
       panel variable:  id (strongly balanced)
        time variable:  year, 1975 to 1981
                delta:  6 units

\[ slpnap_{i,t}=\beta_0+\beta_1 totwrk_{i,t} + \beta_3 educ_{i,t} + \beta_4 marr_{i,t} + \beta_5 yngkid_{i,t} + \beta_6 gdhlth_{i,t} + a_i + a_t \]

  1. slpnap - total minutes of sleep per week
  2. totwrk - total hours of work per year
  3. educ - total years of education
  4. marr - married dummy variable
  5. yngkid - presence of a young child dummy variable
  6. gdhlth - “good health” dummy variable

After we set up our panel, we can control for the unobserved cross-sectional unit effect \(a_i\), or unobserved individual effect that do not vary over time. This is all characteristics of the cross-sectional unit of observation that is constant over time or time-invariant.

We are interested in the tradeoff between work and sleep. Our dependent variable of interest is total minutes sleeping per week. Our explanatory variable of interest is the total working hours during the year.

histogram totwrk
graph export "/Users/Sam/Desktop/Econ 645/Stata/totwrk.png", replace
Histogram of Annual Hours Worked
Histogram of Annual Hours Worked
histogram slpnap
graph export "/Users/Sam/Desktop/Econ 645/Stata/slpnap.png", replace
Histogram of Minutes Sleeping Per Week
Histogram of Minutes Sleeping Per Week

When we run OLS, we do not control for unobserved individual time-invariant effects.

Pooled OLS

est clear
eststo OLS: reg slpnap i.d81 totwrk educ marr yngkid gdhlth 
      Source |       SS           df       MS      Number of obs   =       478
-------------+----------------------------------   F(6, 471)       =      8.50
       Model |  12812772.5         6  2135462.09   Prob > F        =    0.0000
    Residual |   118370944       471  251318.353   R-squared       =    0.0977
-------------+----------------------------------   Adj R-squared   =    0.0862
       Total |   131183717       477  275018.274   Root MSE        =    501.32

------------------------------------------------------------------------------
      slpnap |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.d81 |  -77.02965   47.96519    -1.61   0.109    -171.2819    17.22259
      totwrk |  -.1500954   .0244002    -6.15   0.000    -.1980421   -.1021487
        educ |  -20.25655   8.000179    -2.53   0.012    -35.97701   -4.536091
        marr |  -42.19389   53.07869    -0.79   0.427    -146.4942    62.10645
      yngkid |    74.2579   77.32582     0.96   0.337    -77.68838    226.2042
      gdhlth |   29.03733   59.57889     0.49   0.626    -88.03599    146.1107
       _cons |   3958.756   121.7663    32.51   0.000     3719.483    4198.028
------------------------------------------------------------------------------

FD Model

eststo FD: reg d.slpnap d.totwrk d.educ d.marr d.yngkid d.gdhlth
      Source |       SS           df       MS      Number of obs   =       239
-------------+----------------------------------   F(5, 233)       =      8.19
       Model |  14674698.2         5  2934939.64   Prob > F        =    0.0000
    Residual |  83482611.7       233  358294.471   R-squared       =    0.1495
-------------+----------------------------------   Adj R-squared   =    0.1313
       Total |  98157309.9       238  412425.672   Root MSE        =    598.58

------------------------------------------------------------------------------
    D.slpnap |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      totwrk |
         D1. |  -.2266694    .036054    -6.29   0.000    -.2977029   -.1556359
             |
        educ |
         D1. |  -.0244717   48.75938    -0.00   1.000    -96.09008    96.04113
             |
        marr |
         D1. |   104.2139   92.85536     1.12   0.263    -78.72946    287.1574
             |
      yngkid |
         D1. |    94.6654   87.65252     1.08   0.281    -78.02739    267.3582
             |
      gdhlth |
         D1. |   87.57785   76.59913     1.14   0.254    -63.33758    238.4933
             |
       _cons |  -92.63404    45.8659    -2.02   0.045    -182.9989   -2.269152
------------------------------------------------------------------------------

Our Pooled OLS model shows one additional hour worked per year is associated with a 0.15 fewer minutes per week of sleep, while our FD model one additional hour worked per year is associated with a 0.23 fewer minutes of sleep per week.

Similar except time dummy and base intercepts look at the difference between educ in Pooled and FE models. There is likely a confounder between ability sleep and education.

esttab OLS FD, mtitles se scalars(F r2) drop(0.d81)
                      (1)             (2)   
                      OLS              FD   
--------------------------------------------
1.d81              -77.03                   
                  (47.97)                   

totwrk             -0.150***                
                 (0.0244)                   

educ               -20.26*                  
                  (8.000)                   

marr               -42.19                   
                  (53.08)                   

yngkid              74.26                   
                  (77.33)                   

gdhlth              29.04                   
                  (59.58)                   

D.totwrk                           -0.227***
                                 (0.0361)   

D.educ                            -0.0245   
                                  (48.76)   

D.marr                              104.2   
                                  (92.86)   

D.yngkid                            94.67   
                                  (87.65)   

D.gdhlth                            87.58   
                                  (76.60)   

_cons              3958.8***       -92.63*  
                  (121.8)         (45.87)   
--------------------------------------------
N                     478             239   
F                   8.497           8.191   
r2                 0.0977           0.150   
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Let’s look at elasticities

gen lnslpnap=ln(slpnap)
gen lntotwrk=ln(totwrk)
gen lneduc=ln(educ)

FD Model

reg d.lnslpnap d.lntotwrk d.lneduc d.marr d.yngkid d.gdhlth
      Source |       SS           df       MS      Number of obs   =       214
-------------+----------------------------------   F(5, 208)       =      4.17
       Model |  .805249181         5  .161049836   Prob > F        =    0.0012
    Residual |  8.03583239       208   .03863381   R-squared       =    0.0911
-------------+----------------------------------   Adj R-squared   =    0.0692
       Total |  8.84108157       213  .041507425   Root MSE        =    .19655

------------------------------------------------------------------------------
  D.lnslpnap |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    lntotwrk |
         D1. |  -.0848256   .0206499    -4.11   0.000    -.1255354   -.0441158
             |
      lneduc |
         D1. |  -.0289507   .2628724    -0.11   0.912    -.5471864     .489285
             |
        marr |
         D1. |  -.0038529   .0323002    -0.12   0.905    -.0675306    .0598247
             |
      yngkid |
         D1. |   .0056761   .0290466     0.20   0.845    -.0515874    .0629397
             |
      gdhlth |
         D1. |    .048008   .0266772     1.80   0.073    -.0045843    .1006004
             |
       _cons |  -.0204204   .0155593    -1.31   0.191    -.0510947    .0102538
------------------------------------------------------------------------------

A 1% increase in hours worked per year is associated with a 0.08% decrease in sleep per week.

2.2.2 Distributed Lag of Crime Rates on Clear-up Rate

Lesson: Lagged variables can be important explanatory variables Note: We’ll get into this a bit more in Wooldridge Chapter 10-12.

We will get into time series later in the course, but we will have a preview. With panel data, we are able to see the same observation over time. Given this, we can see if lagged values affect current values for the cross-sectional unit of observation.

Eide(1994) wants to assess if prior clear-up rates for crime have a relationship with crime rates in the current time period. The dependent variable of interest is the current period crime rate. The variable of interest is the clear-up percentage which is the rate of crimes that have lead to conviction.

We will use a log-log model and use a first-difference.

\[ \Delta ln(crime_{i,t})= \delta_0 + \beta_1 \Delta ln(clearup_{i,t-1}) + \beta_2 \Delta ln(clearup_{i,t-2}) + \Delta \varepsilon_{i,t} \]

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use crime3.dta, clear
reg clcrime cclrprc1 cclrprc2
/Users/Sam/Desktop/Econ 645/Data/Wooldridge

      Source |       SS           df       MS      Number of obs   =        53
-------------+----------------------------------   F(2, 50)        =      5.99
       Model |  1.42294697         2  .711473484   Prob > F        =    0.0046
    Residual |  5.93723904        50  .118744781   R-squared       =    0.1933
-------------+----------------------------------   Adj R-squared   =    0.1611
       Total |  7.36018601        52  .141542039   Root MSE        =    .34459

------------------------------------------------------------------------------
     clcrime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    cclrprc1 |  -.0040475   .0047199    -0.86   0.395    -.0135276    .0054326
    cclrprc2 |  -.0131966   .0051946    -2.54   0.014    -.0236302   -.0027629
       _cons |   .0856556   .0637825     1.34   0.185    -.0424553    .2137665
------------------------------------------------------------------------------

Our model tells us that 1 percent increase in getting crimes to conviction is associated with a reduction in the current period crime rate of about 0.013 percent. Assuming this model is specified well, crime rates are sensitive to clear-up rates from the prior 2 years.

2.2.3 Enterprise Zones and Unemployment

Lesson: We need to be aware of heteroskedasticity and serial correlation

We should test for heteroskedasticity and serial correlation when running our First-Difference Estimator.

Serial correlation means that the differences in the idiosyncractic errors between time periods are correlated, which is a violation of our assumption that there is no serial correlation for our first-difference estimator.

We’ll use data from Papke (1994) who studied the effect of Indiana’s enterprise zone program on unemployment claims. Do enterprise zones increase employment and reduce unemployment claims?

She analyzes 22 cities in Indiana from 1980 to 1988, since 6 enterprise zones were established in 1984. Twelve cities did not receive the enterprise zones, while 10 cities did get enterprise zones.

We’ll use a simple analysis that the change in natural log of unemployment claims is a function of time period binaries, change in enterprise zones, and the idiosyncractic error.

One way to do it is to use the data already created.

\[ ln(UnempClaims_{i,t}) = \beta_0 + \beta_1 EnterpriseZone_{i,t} + a_t + a_i + \varepsilon_{i,t} \]

Pooled OLS

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "ezunem.dta", clear
reg luclms i.d82 i.d83 i.d84 i.d85 i.d86 i.d87 i.d88 cez
/Users/Sam/Desktop/Econ 645/Data/Wooldridge

      Source |       SS           df       MS      Number of obs   =       176
-------------+----------------------------------   F(8, 167)       =     10.39
       Model |  29.2551235         8  3.65689043   Prob > F        =    0.0000
    Residual |  58.7753327       167    .3519481   R-squared       =    0.3323
-------------+----------------------------------   Adj R-squared   =    0.3003
       Total |  88.0304561       175  .503031178   Root MSE        =    .59325

------------------------------------------------------------------------------
      luclms |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.d82 |   .4571276   .1788722     2.56   0.011     .1039853    .8102699
       1.d83 |   .1023765   .1788722     0.57   0.568    -.2507658    .4555188
       1.d84 |  -.2821805   .1882109    -1.50   0.136    -.6537598    .0893988
       1.d85 |  -.3150721   .1830816    -1.72   0.087    -.6765248    .0463805
       1.d86 |  -.3470941   .1788722    -1.94   0.054    -.7002364    .0060481
       1.d87 |   -.614778   .1788722    -3.44   0.001    -.9679202   -.2616357
       1.d88 |  -.9534624   .1788722    -5.33   0.000    -1.306605   -.6003201
         cez |  -.0139924   .2146822    -0.07   0.948    -.4378332    .4098484
       _cons |   11.37276   .1264818    89.92   0.000     11.12305    11.62247
------------------------------------------------------------------------------

The Pooled OLS shows that an enterprise zone with unemployment insurance claims. Let’s say we didn’t have the data already created. What we’ll need to do is to set the panel data with xtset unit timeperiod Set the Panel Data

xtset city year
       panel variable:  city (strongly balanced)
        time variable:  year, 1980 to 1988
                delta:  1 unit

We can use the d. operator to use a first difference First Difference

reg d.luclms i.d82 i.d83 i.d84 i.d85 i.d86 i.d87 i.d88 d.ez
      Source |       SS           df       MS      Number of obs   =       176
-------------+----------------------------------   F(8, 167)       =     34.50
       Model |  12.8826331         8  1.61032914   Prob > F        =    0.0000
    Residual |  7.79583815       167  .046681666   R-squared       =    0.6230
-------------+----------------------------------   Adj R-squared   =    0.6049
       Total |  20.6784713       175  .118162693   Root MSE        =    .21606

------------------------------------------------------------------------------
    D.luclms |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.d82 |   .7787595   .0651444    11.95   0.000     .6501469    .9073721
       1.d83 |  -.0331192   .0651444    -0.51   0.612    -.1617318    .0954934
       1.d84 |  -.0171382   .0685455    -0.25   0.803    -.1524655    .1181891
       1.d85 |    .323081   .0666774     4.85   0.000     .1914417    .4547202
       1.d86 |    .292154   .0651444     4.48   0.000     .1635413    .4207666
       1.d87 |   .0539481   .0651444     0.83   0.409    -.0746645    .1825607
       1.d88 |  -.0170526   .0651444    -0.26   0.794    -.1456652    .1115601
             |
          ez |
         D1. |  -.1818775   .0781862    -2.33   0.021    -.3362382   -.0275169
             |
       _cons |  -.3216319    .046064    -6.98   0.000    -.4125748   -.2306891
------------------------------------------------------------------------------

When we use a FD estimator, a city that adopts an enterprise is associated with a reduction in unemployment insurance claims by \((e^(-0.1819)-1)*100\% = 16.6\%\)

Test for Heteroskedasticity:

Bruesch-Pagan/Cameron-Trivedi Test using estat post-estimation command

estat imtest
Cameron & Trivedi's decomposition of IM-test

---------------------------------------------------
              Source |       chi2     df      p
---------------------+-----------------------------
  Heteroskedasticity |      12.85      9    0.1697
            Skewness |      13.04      8    0.1104
            Kurtosis |       0.11      1    0.7414
---------------------+-----------------------------
               Total |      26.00     18    0.0998
---------------------------------------------------

White Test using estat postestimation command

estat imtest, white
White's test for Ho: homoskedasticity
         against Ha: unrestricted heteroskedasticity

         chi2(9)      =     12.85
         Prob > chi2  =    0.1697

Cameron & Trivedi's decomposition of IM-test

---------------------------------------------------
              Source |       chi2     df      p
---------------------+-----------------------------
  Heteroskedasticity |      12.85      9    0.1697
            Skewness |      13.04      8    0.1104
            Kurtosis |       0.11      1    0.7414
---------------------+-----------------------------
               Total |      26.00     18    0.0998
---------------------------------------------------

We fail to reject the null hypothesis of homoskedastic standard errors.

Test for autocorrelation We can use the command xtserial to test for serial correlation, since we already set up our panel above.

    help xtserial
xtserial luclms d83 d84 d85 d86 d87 d88 cez, output
Linear regression                               Number of obs     =        154
                                                F(7, 21)          =     102.85
                                                Prob > F          =     0.0000
                                                R-squared         =     0.4711
                                                Root MSE          =     .28063

                                  (Std. Err. adjusted for 22 clusters in city)
------------------------------------------------------------------------------
             |               Robust
    D.luclms |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         d83 |
         D1. |  -.3547511   .0410541    -8.64   0.000    -.4401277   -.2693745
             |
         d84 |
         D1. |  -.7240107   .0816269    -8.87   0.000     -.893763   -.5542583
             |
         d85 |
         D1. |  -.7620015   .0540757   -14.09   0.000    -.8744581   -.6495448
             |
         d86 |
         D1. |  -.8042218   .0490039   -16.41   0.000     -.906131   -.7023126
             |
         d87 |
         D1. |  -1.071906   .0496977   -21.57   0.000    -1.175258   -.9685535
             |
         d88 |
         D1. |   -1.41059    .065833   -21.43   0.000    -1.547497   -1.273683
             |
         cez |
         D1. |   -.070083   .0650804    -1.08   0.294     -.205425     .065259
------------------------------------------------------------------------------

Wooldridge test for autocorrelation in panel data
H0: no first-order autocorrelation
    F(  1,      21) =    120.981
           Prob > F =      0.0000

We reject the null hypothesis that the differences in idiosyncratic errors is 0.

What can we do for this serial correlation? We can cluster our standard errors to account for the serial correlation within units.

Robust for serial correlation within unit of analysis clusters

reg d.luclms i.d82 i.d83 i.d84 i.d85 i.d86 i.d87 i.d88 d.cez, robust cluster(city)
note: 1.d88 omitted because of collinearity

Linear regression                               Number of obs     =        154
                                                F(7, 21)          =      41.20
                                                Prob > F          =     0.0000
                                                R-squared         =     0.6333
                                                Root MSE          =     .21865

                                  (Std. Err. adjusted for 22 clusters in city)
------------------------------------------------------------------------------
             |               Robust
    D.luclms |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.d82 |   .7958121   .0577076    13.79   0.000     .6758025    .9158217
       1.d83 |  -.0160666   .0565946    -0.28   0.779    -.1337615    .1016282
       1.d84 |  -.0305751   .0742049    -0.41   0.684    -.1848926    .1237424
       1.d85 |   .3006937   .0568959     5.28   0.000     .1823723    .4190151
       1.d86 |   .2964642   .0786548     3.77   0.001     .1328926    .4600357
       1.d87 |   .0710007   .0462206     1.54   0.139    -.0251204    .1671217
       1.d88 |          0  (omitted)
             |
         cez |
         D1. |   -.070083   .0653029    -1.07   0.295    -.2058877    .0657217
             |
       _cons |  -.3386845   .0359416    -9.42   0.000    -.4134292   -.2639397
------------------------------------------------------------------------------

Once we account for the serial correlation in the idiosyncratic error, our coefficient on the change in enterprise zone becomes statistically insignificant.

Testing for serial correlation, and Clustering your standard error at the treatment level is an important test and step with FD estimators.

When we get to Fixed Effects (Within) estimator, the assumption is slightly different. For First-Difference Estimator, the DIFFERENCE in idiosyncratic errors cannot be correlated, but for Fixed-Effects (Within) Estimator the assumption states that only the idiosyncratic errors cannot be correlated.

2.2.4 County Crimes in North Carolina

Lesson: First-Difference Estimator (and Fixed Effects Estimator) cannot fix simultaneity bias. We need an valid instrument to deal with simultaneity bias.

Cornwell and Trumbull (1994) analyzed data on 90 counties in North Carolina from 1981 to 1987 to using panel data to account for time-invariant effects. Their cross-sectional unit of observation is the county (not individuals within a county). They want to know what the is the effect of police per capita on crime rates.

For our model the dependent variable is the change in the natural log of crimes per person (lcrimrte). They are interested in estimating the sensitivity of police per capita (polpc) on crime rate, so they use the difference in natural log of police per capita. This will get you elasticities, which provide useful interpretations of percentage increases.

They also control for the difference in natural log of probability of arrest (prbarr), the probability of conviction (prbconv), the probability of serving time in prison given a conviction (prbpris), and the average sentence length.

\[ ln(crimerate_{i,t}) = \beta_0 + \beta_1 ln(prbcon_{i,t}) + \beta_2 ln(prbpris_{i,t}) + \beta_3 ln(avgsen_{i,t}) + \beta_4 ln(polpc_{i,t}) + a_t + a_i + \varepsilon_{i,t} \] Pooled OLS

cd "/Users/Sam/Desktop/Econ 645/Data/Wooldridge"
use "crime4.dta", clear
reg lcrmrte i.d82 i.d83 i.d84 i.d85 i.d86 i.d87 lprbarr lprbconv lprbpris lavgsen lpolpc
/Users/Sam/Desktop/Econ 645/Data/Wooldridge

      Source |       SS           df       MS      Number of obs   =       630
-------------+----------------------------------   F(11, 618)      =     74.49
       Model |  117.644669        11  10.6949699   Prob > F        =    0.0000
    Residual |   88.735673       618  .143585231   R-squared       =    0.5700
-------------+----------------------------------   Adj R-squared   =    0.5624
       Total |  206.380342       629  .328108652   Root MSE        =    .37893

------------------------------------------------------------------------------
     lcrmrte |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.d82 |   .0051371    .057931     0.09   0.929    -.1086284    .1189026
       1.d83 |   -.043503   .0576243    -0.75   0.451    -.1566662    .0696601
       1.d84 |  -.1087542    .057923    -1.88   0.061     -.222504    .0049957
       1.d85 |  -.0780454   .0583244    -1.34   0.181    -.1925835    .0364928
       1.d86 |  -.0420791   .0578218    -0.73   0.467      -.15563    .0714719
       1.d87 |  -.0270426    .056899    -0.48   0.635    -.1387815    .0846963
     lprbarr |  -.7195033   .0367657   -19.57   0.000    -.7917042   -.6473024
    lprbconv |  -.5456589   .0263683   -20.69   0.000    -.5974413   -.4938765
    lprbpris |   .2475521   .0672268     3.68   0.000     .1155314    .3795728
     lavgsen |  -.0867575   .0579205    -1.50   0.135    -.2005023    .0269872
      lpolpc |   .3659886   .0300252    12.19   0.000     .3070248    .4249525
       _cons |  -2.082293   .2516253    -8.28   0.000    -2.576438   -1.588149
------------------------------------------------------------------------------

First Difference

xtset county year
       panel variable:  county (strongly balanced)
        time variable:  year, 81 to 87
                delta:  1 unit

With xtset, we can use the d. for our explanatory variables.

First-Difference

reg d.lcrmrte i.d83 i.d84 i.d85 i.d86 i.d87 d.lprbarr d.lprbconv d.lprbpris d.lavgsen d.lpolpc
      Source |       SS           df       MS      Number of obs   =       540
-------------+----------------------------------   F(10, 529)      =     40.32
       Model |  9.60042828        10  .960042828   Prob > F        =    0.0000
    Residual |  12.5963755       529  .023811674   R-squared       =    0.4325
-------------+----------------------------------   Adj R-squared   =    0.4218
       Total |  22.1968038       539  .041181454   Root MSE        =    .15431

------------------------------------------------------------------------------
   D.lcrmrte |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.d83 |  -.0998658   .0238953    -4.18   0.000    -.1468071   -.0529246
       1.d84 |  -.0479374   .0235021    -2.04   0.042    -.0941063   -.0017686
       1.d85 |  -.0046111   .0234998    -0.20   0.845    -.0507756    .0415533
       1.d86 |   .0275143   .0241494     1.14   0.255    -.0199261    .0749548
       1.d87 |   .0408267   .0244153     1.67   0.095    -.0071361    .0887895
             |
     lprbarr |
         D1. |  -.3274942   .0299801   -10.92   0.000    -.3863889   -.2685994
             |
    lprbconv |
         D1. |  -.2381066   .0182341   -13.06   0.000    -.2739268   -.2022864
             |
    lprbpris |
         D1. |  -.1650462    .025969    -6.36   0.000    -.2160613   -.1140312
             |
     lavgsen |
         D1. |  -.0217607   .0220909    -0.99   0.325    -.0651574     .021636
             |
      lpolpc |
         D1. |   .3984264    .026882    14.82   0.000     .3456177     .451235
             |
       _cons |   .0077134   .0170579     0.45   0.651    -.0257961    .0412229
------------------------------------------------------------------------------

So our model is estimating that a 1% increase in police per capita increases the crime rate by about 0.4%. We would expect that increases in police per capita would reduce crime rates. This is likely shows that there are problems in our model.

What is happening? Likely simultaneity bias is occurring. Areas with high crime rates have more police per capita, which more police per capita is associated with higher crime rates. It’s a circular/endogenous reference, so we will need an valid instrument to deal with this simultaneity bias/endogeneity.

We can test for heteroskedasticity and serial correlation, but this only affects our standard errors, not our coefficients.

Test for autocorrelation with one lag AR(1). We can see that serial correlation is a problem, which means we should cluster our standard errors at the county level.

predict r, residual
gen lag_r =l.r
reg r lag_r i.d83 i.d84 i.d85 i.d86 i.d87 d.lprbarr d.lprbconv d.lprbpris d.lavgsen d.lpolpc
(90 missing values generated)

(180 missing values generated)

note: 1.d87 omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =       450
-------------+----------------------------------   F(10, 439)      =      2.35
       Model |  .564663971        10  .056466397   Prob > F        =    0.0102
    Residual |   10.528838       439  .023983686   R-squared       =    0.0509
-------------+----------------------------------   Adj R-squared   =    0.0293
       Total |   11.093502       449  .024707131   Root MSE        =    .15487

------------------------------------------------------------------------------
           r |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       lag_r |  -.2332117   .0488802    -4.77   0.000      -.32928   -.1371435
       1.d83 |  -.0012379   .0234019    -0.05   0.958    -.0472317    .0447558
       1.d84 |  -.0016543   .0236303    -0.07   0.944    -.0480968    .0447883
       1.d85 |  -.0006044   .0236114    -0.03   0.980    -.0470098    .0458009
       1.d86 |   .0008841   .0233216     0.04   0.970    -.0449517    .0467199
       1.d87 |          0  (omitted)
             |
     lprbarr |
         D1. |   .0082875   .0330698     0.25   0.802    -.0567074    .0732823
             |
    lprbconv |
         D1. |  -.0036161   .0200155    -0.18   0.857    -.0429542    .0357221
             |
    lprbpris |
         D1. |   .0017131    .027855     0.06   0.951    -.0530326    .0564588
             |
     lavgsen |
         D1. |  -.0125831   .0245006    -0.51   0.608    -.0607362      .03557
             |
      lpolpc |
         D1. |   .0214883   .0282688     0.76   0.448    -.0340706    .0770473
             |
       _cons |   .0005347   .0167411     0.03   0.975     -.032368    .0334373
------------------------------------------------------------------------------

We can always use our xtserial command to test for AR(1) serial correlation.

reg d.lcrmrte i.d83 i.d84 i.d85 i.d86 i.d87 d.lprbarr d.lprbconv d.lprbpris d.lavgsen d.lpolpc
xtserial lcrmrte d83 d84 d85 d86 d87 lprbarr lprbconv lprbpris lavgsen lpolpc, output
      Source |       SS           df       MS      Number of obs   =       540
-------------+----------------------------------   F(10, 529)      =     40.32
       Model |  9.60042828        10  .960042828   Prob > F        =    0.0000
    Residual |  12.5963755       529  .023811674   R-squared       =    0.4325
-------------+----------------------------------   Adj R-squared   =    0.4218
       Total |  22.1968038       539  .041181454   Root MSE        =    .15431

------------------------------------------------------------------------------
   D.lcrmrte |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.d83 |  -.0998658   .0238953    -4.18   0.000    -.1468071   -.0529246
       1.d84 |  -.0479374   .0235021    -2.04   0.042    -.0941063   -.0017686
       1.d85 |  -.0046111   .0234998    -0.20   0.845    -.0507756    .0415533
       1.d86 |   .0275143   .0241494     1.14   0.255    -.0199261    .0749548
       1.d87 |   .0408267   .0244153     1.67   0.095    -.0071361    .0887895
             |
     lprbarr |
         D1. |  -.3274942   .0299801   -10.92   0.000    -.3863889   -.2685994
             |
    lprbconv |
         D1. |  -.2381066   .0182341   -13.06   0.000    -.2739268   -.2022864
             |
    lprbpris |
         D1. |  -.1650462    .025969    -6.36   0.000    -.2160613   -.1140312
             |
     lavgsen |
         D1. |  -.0217607   .0220909    -0.99   0.325    -.0651574     .021636
             |
      lpolpc |
         D1. |   .3984264    .026882    14.82   0.000     .3456177     .451235
             |
       _cons |   .0077134   .0170579     0.45   0.651    -.0257961    .0412229
------------------------------------------------------------------------------


Linear regression                               Number of obs     =        540
                                                F(10, 89)         =      13.68
                                                Prob > F          =     0.0000
                                                R-squared         =     0.4323
                                                Root MSE          =     .15419

                                (Std. Err. adjusted for 90 clusters in county)
------------------------------------------------------------------------------
             |               Robust
   D.lcrmrte |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         d83 |
         D1. |  -.0920147   .0146354    -6.29   0.000     -.121095   -.0629345
             |
         d84 |
         D1. |  -.1323557   .0179759    -7.36   0.000    -.1680734    -.096638
             |
         d85 |
         D1. |  -.1293278   .0232716    -5.56   0.000    -.1755679   -.0830876
             |
         d86 |
         D1. |  -.0938692   .0209561    -4.48   0.000    -.1355086   -.0522298
             |
         d87 |
         D1. |  -.0449434   .0236533    -1.90   0.061     -.091942    .0020552
             |
     lprbarr |
         D1. |  -.3263268   .0561808    -5.81   0.000     -.437957   -.2146967
             |
    lprbconv |
         D1. |  -.2375214   .0392585    -6.05   0.000    -.3155272   -.1595155
             |
    lprbpris |
         D1. |  -.1645109   .0457092    -3.60   0.001     -.255334   -.0736878
             |
     lavgsen |
         D1. |  -.0246909   .0252756    -0.98   0.331    -.0749129    .0255312
             |
      lpolpc |
         D1. |   .3976861   .1024878     3.88   0.000     .1940451    .6013271
------------------------------------------------------------------------------

Wooldridge test for autocorrelation in panel data
H0: no first-order autocorrelation
    F(  1,      89) =     22.065
           Prob > F =      0.0000

We can also test for heteroskedasticity

quietly reg d.lcrmrte i.d83 i.d84 i.d85 i.d86 i.d87 d.lprbarr d.lprbconv d.lprbpris d.lavgsen d.lpolpc
rvfplot,yline(0)
graph export "/Users/Sam/Desktop/Econ 645/Stata/week3_htsk.png", replace
Plotting Errors
Plotting Errors

We can use the White Test and Breusch-Pagan, which yield different results

estat imtest, white
estat hettest
White's test for Ho: homoskedasticity
         against Ha: unrestricted heteroskedasticity

         chi2(50)     =    257.57
         Prob > chi2  =    0.0000

Cameron & Trivedi's decomposition of IM-test

---------------------------------------------------
              Source |       chi2     df      p
---------------------+-----------------------------
  Heteroskedasticity |     257.57     50    0.0000
            Skewness |      62.96     10    0.0000
            Kurtosis |      12.82      1    0.0003
---------------------+-----------------------------
               Total |     333.35     61    0.0000
---------------------------------------------------


Breusch-Pagan / Cook-Weisberg test for heteroskedasticity 
         Ho: Constant variance
         Variables: fitted values of D.lcrmrte

         chi2(1)      =     1.16
         Prob > chi2  =   0.2807

Re-estimate with Robust Standard Errors clustered at the county level and cluster by county (deal with error correlations within counties).

reg d.lcrmrte i.d83 i.d84 i.d85 i.d86 i.d87 d.lprbarr d.lprbconv d.lprbpris d.lavgsen d.lpolpc, robust cluster(county)
Linear regression                               Number of obs     =        540
                                                F(10, 89)         =      13.56
                                                Prob > F          =     0.0000
                                                R-squared         =     0.4325
                                                Root MSE          =     .15431

                                (Std. Err. adjusted for 90 clusters in county)
------------------------------------------------------------------------------
             |               Robust
   D.lcrmrte |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.d83 |  -.0998658   .0222563    -4.49   0.000    -.1440887    -.055643
       1.d84 |  -.0479374   .0200531    -2.39   0.019    -.0877825   -.0080923
       1.d85 |  -.0046111     .02503    -0.18   0.854    -.0543453     .045123
       1.d86 |   .0275143   .0211829     1.30   0.197    -.0145756    .0696043
       1.d87 |   .0408267   .0241102     1.69   0.094    -.0070797    .0887331
             |
     lprbarr |
         D1. |  -.3274942   .0564281    -5.80   0.000    -.4396157   -.2153727
             |
    lprbconv |
         D1. |  -.2381066   .0395843    -6.02   0.000    -.3167598   -.1594534
             |
    lprbpris |
         D1. |  -.1650462   .0457923    -3.60   0.001    -.2560345    -.074058
             |
     lavgsen |
         D1. |  -.0217607     .02582    -0.84   0.402    -.0730644     .029543
             |
      lpolpc |
         D1. |   .3984264   .1029342     3.87   0.000     .1938983    .6029545
             |
       _cons |   .0077134   .0137846     0.56   0.577    -.0196763     .035103
------------------------------------------------------------------------------

Even accounting for heteroskedasticity and serial correlation, our model is likely biased by simultaneity bias.