Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference: Country/year FE incorporated in one categorical variable using areg/reghdfe VS. country/year FE separately using reghdfe?

    Hey Stata-community

    I am currently analyzing the effect of moisture change (ADsm0_2moistu) on the urbanization change rate (ADurbfrac) on district (afruid) level in India, following the example by Henderson, Storeygard and Deichmann (2017) --> https://sites.google.com/site/adamstoreygard/ "Has climate change driven urbanization in Africa?" (on the provided website you'll find the data set they used and also their .do file for the regressions; I basically replicated the data set and code for India. Difference is that FE in my case is on state level instead of country level.)

    The regression analysis is following:
    $u_{ijt} = \beta_1 w_{ijt} + \beta_2 X^{\prime}_{ij} + \beta_3 X^{\prime}_{ij} w_{ijt} + \alpha_{jt} +\epsilon_{ijt}$

    i=district, j=country, t=time
    u=annualized urbanization growth
    w=annualized moisture growth
    X= vector of time-invariant characteristics (e.g. lndisct --> distance to coast)

    --> in the literature they use the categorical variable countryyear incorporating both country/year fixed effect. They regress using the areg command.


    Code:
    areg ADurbfrac ADsm0_2moistu firsturbfrac lndiscst if abspctileADsm0_2moistu>6 & abspctileADurbfrac>6
    > , absorb(countryyear) vce(cluster afruid)
    
    Linear regression, absorbing indicators Number of obs = 717
    F( 3, 358) = 48.46
    Prob > F = 0.0000
    R-squared = 0.3872
    Adj R-squared = 0.3302
    Root MSE = 0.0342
    
    (Std. Err. adjusted for 359 clusters in afruid)
    -------------------------------------------------------------------------------
    | Robust
    ADurbfrac | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    --------------+----------------------------------------------------------------
    ADsm0_2moistu | -.07611 .1801611 -0.42 0.673 -.4304171 .2781971
    firsturbfrac | -.0488972 .0055254 -8.85 0.000 -.0597635 -.0380309
    lndiscst | .0014311 .0018852 0.76 0.448 -.0022764 .0051386
    _cons | .028879 .0120392 2.40 0.017 .0052024 .0525555
    --------------+----------------------------------------------------------------
    countryyear | absorbed (59 categories)
    Using reghdfe instead, provides the same results (it just suppresses the _const):
    Code:
    reghdfe ADurbfrac ADsm0_2moistu firsturbfrac lndiscst if abspctileADsm0_2moistu>6 & abspctileADurbfrac>6, absorb(countryyear) vce(cluster afruid)
    However results are different when I use a country-FE and year-FE separately using reghdfe.


    Code:
    egen countryvar = group(iso3v10)
    reghdfe ADurbfrac ADsm0_2moistu firsturbfrac lndiscst if abspctileADsm0_2moistu>6 & abspctileADurbfrac>6, absorb(countryvar year) vce(cluster afruid)
    
    . reghdfe ADurbfrac ADsm0_2moistu firsturbfrac lndiscst if abspctileADsm0_2moistu>6 & abspctileADurbfra
    > c>6, absorb(countryvar year ) vce(cluster afruid)
    (converged in 18 iterations)
    
    HDFE Linear regression Number of obs = 717
    Absorbing 2 HDFE groups F( 3, 358) = 47.51
    Statistics robust to heteroskedasticity Prob > F = 0.0000
    R-squared = 0.3315
    Adj R-squared = 0.2748
    Within R-sq. = 0.0795
    Number of clusters (afruid) = 359 Root MSE = 0.0356
    
    (Std. Err. adjusted for 359 clusters in afruid)
    -------------------------------------------------------------------------------
    | Robust
    ADurbfrac | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    --------------+----------------------------------------------------------------
    ADsm0_2moistu | .2010312 .1738053 1.16 0.248 -.1407765 .5428389
    firsturbfrac | -.0497331 .0055398 -8.98 0.000 -.0606277 -.0388384
    lndiscst | .0008949 .0018817 0.48 0.635 -.0028057 .0045955
    -------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    ----------------------------------------------------------------+
    Absorbed FE | Num. Coefs. = Categories - Redundant |
    --------------+-------------------------------------------------|
    countryvar | 30 30 0 |
    year | 24 33 9 |
    ----------------------------------------------------------------+

    So the difference is that here the fixed effects are represented as own cat variables (countryvar year), whereby countryyear (like above) incorporates both..


    My questions would be:

    a) why are there different results although countryyear basically absorbs both country and year effects in one value?
    b) which approach is more appropriate, when I want to incorporate fixed effects on state level and for the years?

    c) and a bit off the topic: How can I interpret a coefficient of ADsm0_2moistu (which is the annualized change of moisture) ? One unit more of ADsm0_2moistu means what exactly in terms of the annualized change rate of urbanization?


    I hope my issue is illustrated clearly. I'll be super happy to get some answers!

    Best, Carolin
    Last edited by Caro Gunesch; 15 Jun 2018, 07:45.

  • #2
    a) why are there different results although countryyear basically absorbs both country and year effects in one value?

    $$u_{ijt} = \beta_1 w_{ijt} + \beta_2 X^{\prime}_{ij} + \beta_3 X^{\prime}_{ij} w_{ijt} + \alpha_{jt} +\epsilon_{ijt}$$


    So in your specification above, \(\alpha_{jt}\) is a country-year fixed effect controlling for country-time varying effects. Here, you multiply the time effect and the country effect. Denote these as \(\eta_{j}\) and \(\mu_{t}\) and you can define the country-year effect as

    $$\alpha_{jt} = \eta_{j} \times \mu_{t}$$

    In your second specification, you are specifying the model

    $$u_{ijt} = \beta_1 w_{ijt} + \beta_2 X^{\prime}_{ij} + \beta_3 X^{\prime}_{ij} w_{ijt} + \eta_{j} + \mu_{t} +\epsilon_{ijt}$$

    Here, you are controlling for both time invariant country effects and country invariant time effects. So country and year is not the same as country-year. One effect is additive and the other is multiplicative. In areg, the only way to include two levels of fixed effects is to use dummy variables, e.g.,

    Code:
    areg ADurbfrac ADsm0_2moistu firsturbfrac lndiscst i.year if abspctileADsm0_2moistu>6 ///
    & abspctileADurbfrac>6, absorb(countryvar) vce(cluster afruid)
    b) which approach is more appropriate, when I want to incorporate fixed effects on state level and for the years?
    This depends on your research question and identification strategy. I am sure the authors have justified their use of a country-year effect and if you are exactly replicating their research, probably their specification is correct unless you have reasons to doubt it.


    c) and a bit off the topic: How can I interpret a coefficient of ADsm0_2moistu (which is the annualized change of moisture) ? One unit more of ADsm0_2moistu means what exactly in terms of the annualized change rate of urbanization?
    You interpret fixed effects coefficients exactly as OLS coefficients. A one unit increase in the independent variable increases (decreases) the dependent variable by XX units holding all other variables constant and after controlling for time-year fixed effects (note time-year and not time and year). You need to know what units your independent variable and dependent variable are measured in. It could be, for example, a 1% increase in moisure levels increases the rate of urbanization by 0.2% (if both are in percentages).

    Comment

    Working...
    X