Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with controls and fixed effects in DID

    Hi,

    I'm currently using triple difference to analyse the effect of minimum marriage age legislation on the prevalence of child marriages and infant mortality, and have a few issues with my controls I'm including and am getting confused.

    1) So far I have included controls for age, whether they live in a rural area, whether they have any education, whether they are classed as poor according to their wealth index, and also fixed effects for country, year, ethnicity and possibly year of marriage. Many of my controls are time-invariant, but they do not get omitted when I run my regressions with time FE so is there something wrong with my results - I've seen on other posts that time-invariant variables should end up omitted with FE? Should I just forget about including them anyway if they are time-invariant?

    2) I am using reghdfe so am absorbing the fixed effects but should I be including the other controls in the absorb bracket as well - it gave me the same treatment effect estimator whichever way I did but I just wanted to know which way makes my code more accurate.

    3) I'm unsure whether I need the marriage year fixed effects - it increases the magnitude and significance of my coefficient but I'm not sure if its actually necessary since I already have year FE - my reason for including it was because a similar paper has but they didn't have a variable for every year like I do (they only had year of survey)

    4) I have also included my results table and wanted to check I am interpreting my coefficient correctly - my dependent variable is a dummy for whether an individual is married under 18 - so can I interpret it as: raising the minimum marriage age to 18 causes a 12 percentage point decrease in the prevalence of underage marriage

    Thank you

    Code:
    reghdfe underagemar dchildmar##postreform2 age rural everschool poor, vce(cluster country ethnicityall) absorb(country ethnicityall currentyear marriageyear)
    Code:
    HDFE Linear regression                            Number of obs   =  4,331,313
    Absorbing 4 HDFE groups                           F(   6,     10) =     195.82
    Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                      R-squared       =     0.5829
                                                      Adj R-squared   =     0.5828
    Number of clusters (country) =         11         Within R-sq.    =     0.4114
    Number of clusters (ethnicityall) =        136    Root MSE        =     0.3229
    
                               (Std. err. adjusted for 11 clusters in country ethnicityall)
    ---------------------------------------------------------------------------------------
                          |               Robust
              underagemar | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ----------------------+----------------------------------------------------------------
              1.dchildmar |          0  (omitted)
            1.postreform2 |   .1263256    .040137     3.15   0.010     .0368949    .2157563
                          |
    dchildmar#postreform2 |
                     1 1  |   -.119798   .0419748    -2.85   0.017    -.2133236   -.0262723
                          |
                      age |  -.0790853   .0075066   -10.54   0.000    -.0958111   -.0623595
                    rural |   .0110953   .0047582     2.33   0.042     .0004935    .0216971
               everschool |   -.019142   .0082793    -2.31   0.043    -.0375895   -.0006945
                     poor |    .011446   .0049752     2.30   0.044     .0003605    .0225314
                    _cons |   2.070794   .1527688    13.56   0.000     1.730404    2.411184
    ---------------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    ------------------------------------------------------+
      Absorbed FE | Categories  - Redundant  = Num. Coefs |
    --------------+---------------------------------------|
          country |        11          11           0    *|
     ethnicityall |       136         136           0    *|
      currentyear |        63           0          63     |
     marriageyear |        61           1          60     |
    ------------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation

  • #2
    Anjaly:
    1) you should also include the squared term for age:
    Code:
    c.age##c,age
    If time-invariante variabe are not wiped out by the -fe- estimator, double check your data;
    2) and 3) -abs(country year)- is usually enough;
    4) are you referring to the coefficient of interaction or else?
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thank you Carlo!
      I have added the squared term but was wondering what actually is the reason for including it? I don't think any of the papers similar to mine included it.
      I have checked my data and the variables do not change over time so I'm not sure what's happening - should I leave them all out and use age and age squared as my only additional controls (except the variables in the absorb bracket)?

      For (4), yes I was referring to the coefficient of interaction

      Here are my new results when I include the age squared term (and for now I took out the other controls I was unsure about). The p value for the squared term is very high and the coefficient is small - is this still fine to include it?
      Also, the coefficient on the interaction is a lot different than when I included the marriage year FE (I know I also took out some controls but this doesn't really change the coefficient as drastically)

      Code:
      HDFE Linear regression                            Number of obs   =  4,331,313
      Absorbing 3 HDFE groups                           F(   4,     10) =     254.08
      Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                        R-squared       =     0.2030
                                                        Adj R-squared   =     0.2030
      Number of clusters (country) =         11         Within R-sq.    =     0.0768
      Number of clusters (ethnicityall) =        136    Root MSE        =     0.4464
      
                                 (Std. err. adjusted for 11 clusters in country ethnicityall)
      ---------------------------------------------------------------------------------------
                            |               Robust
                underagemar | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      ----------------------+----------------------------------------------------------------
                1.dchildmar |          0  (omitted)
              1.postreform2 |  -.3789593   .0379854    -9.98   0.000     -.463596   -.2943225
                            |
      dchildmar#postreform2 |
                       1 1  |  -.0364888   .0327137    -1.12   0.291    -.1093794    .0364019
                            |
                        age |      -.015   .0044074    -3.40   0.007    -.0248204   -.0051796
                            |
                c.age#c.age |   .0000161   .0000786     0.20   0.842     -.000159    .0001911
                            |
                      _cons |    .904372   .0626923    14.43   0.000     .7646848    1.044059
      ---------------------------------------------------------------------------------------
      
      Absorbed degrees of freedom:
      ------------------------------------------------------+
        Absorbed FE | Categories  - Redundant  = Num. Coefs |
      --------------+---------------------------------------|
            country |        11          11           0    *|
       ethnicityall |       136         136           0    *|
        currentyear |        63           0          63     |
      ------------------------------------------------------+
      * = FE nested within cluster; treated as redundant for DoF computation

      Comment


      • #4
        I think I've actually misunderstood my data slightly - the other controls I included are time invariant across individuals but not countries - so would these actually be ok to include since i have country fixed effects

        Comment


        • #5
          Anjali:
          1)
          Code:
          c,age##c.age
          was aimed at investigating within panels turning points, which are not present in the results of your linear probability model (as sq_age coefficient does not reach statistical significance). Therefore you can safely omit sq_age;
          2) interaction coefficients cannot be interepreted separately from those of teh conditional main effects (calculate -predict, xb- to have an idea on how to sum the coefficients for the observations you're interested in);
          3) as far as your question #3 is concerned, it depends on how you've -xtset- your dataset.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Thank you for your help so far, I'm still a bit confused.
            2) interaction coefficients cannot be interepreted separately from those of teh conditional main effects (calculate -predict, xb- to have an idea on how to sum the coefficients for the observations you're interested in);
            I had thought from previous posts that because I'm doing DDD, the treatment effect can just be interpreted straight from the table - what do I do once I have used predict, xb

            3) as far as your question #3 is concerned, it depends on how you've -xtset- your dataset.
            I'm not completely sure which question you are talking about here, I'm assuming my question about including marriage year. Either way, I haven't actually xtset my data as I was told I didn't actually need to for reghdfe. Since I am absorbing year FE anyway and that includes the marriage years within it, does it make more sense to leave marriage year FE out?

            I'm still confused about whether I should be including my other individual level controls - i.e. rural, education, age and poor.

            Comment


            • #7
              Anjali:
              1) -predict- gives you the sum of all coefficients (+constant, if any) multiplied by the value of the variables for each observation: I find it very useful to invoke -predict-, do calculation by hand for the variables I'm interested in and then compare my results with those provide by -predict-
              2) Yes, I meant that. The goal of any regression model is to give a fair and true view of the data generation process under investigation. Controls are additional stuff to be included when necessary. I would take a look at the literature in your reserach field and see what others did in the oast when dealing with the very same reserch topic.
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Sorry, I'm quite new to Stata so I'm still not completely sure what to do with predict. Am I supposed to use the tab command to look at the values? I have a lot of observations so am not sure if this is possible.

                With regards to (2), I have been following a similar paper but they did not have a variable for each year like I do, they only had variables for survey year and marriage year so just wanted to check whether I actually needed some of their controls in my specification.

                Thanks again for all your help so far

                Comment


                • #9
                  Anjali:

                  0) first of all, ny previous interaction code should have been written as:
                  Code:
                  c.age##c.age
                  1) the following toy-example may shed some light on how to understand -predict- calculations (which are not error prone):
                  Code:
                  . use "https://www.stata-press.com/data/r17/nlswork.dta"
                  (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                  
                  . reghdfe ln_wage c.age##c.age, abs(idcode year) vce(cluster idcode)
                  (dropped 551 singleton observations)
                  (MWFE estimator converged in 8 iterations)
                  
                  HDFE Linear regression                            Number of obs   =     27,959
                  Absorbing 2 HDFE groups                           F(   2,   4158) =      44.91
                  Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                                    R-squared       =     0.6593
                                                                    Adj R-squared   =     0.5995
                                                                    Within R-sq.    =     0.0115
                  Number of clusters (idcode)  =      4,159         Root MSE        =     0.3013
                  
                                               (Std. err. adjusted for 4,159 clusters in idcode)
                  ------------------------------------------------------------------------------
                               |               Robust
                       ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                           age |   .0728746   .0136873     5.32   0.000     .0460402    .0997089
                               |
                   c.age#c.age |  -.0010113   .0001076    -9.39   0.000    -.0012224   -.0008003
                               |
                         _cons |   .4586164   .3651743     1.26   0.209    -.2573205    1.174553
                  ------------------------------------------------------------------------------
                  
                  Absorbed degrees of freedom:
                  -----------------------------------------------------+
                   Absorbed FE | Categories  - Redundant  = Num. Coefs |
                  -------------+---------------------------------------|
                        idcode |      4159        4159           0    *|
                          year |        15           0          15     |
                  -----------------------------------------------------+
                  * = FE nested within cluster; treated as redundant for DoF computation
                  
                  
                  . predict fitted, xb
                  (24 missing values generated)
                  
                  . list age fitted in 1
                  
                       +----------------+
                       | age     fitted |
                       |----------------|
                    1. |  18   1.442685 |
                       +----------------+
                  
                  . di .4586164 + (.0728746*18)+(-.0010113*18^2)
                  1.442698
                  
                  .
                  2) if you cannot find similar example in literature, you can plug in the controls you mentioned,
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment


                  • #10
                    Thank you, I have followed the example as you said and this is the output I got:
                    Code:
                    . list dchildmar##postreform2 age rural everschool poor pred in 1 
                    
                         +-----------------------------------------------------------------------------+
                         |        1.         0.   1.dchi~r#                                            |
                         | dchild~r   postre~2    0.post~2   age   rural   eversc~l   poor        pred |
                         |-----------------------------------------------------------------------------|
                      1. |        1          1           1    11       1          0      1   1.2233969 |
                         +-----------------------------------------------------------------------------+
                    
                    . di 2.070794 + (-.119798) + (.1263256) + (11*-.0790853) + (.0110953) + (.011446) 
                    1.2299246
                    The results are very similar - what do I do with this now? Do I interpret 1.223 as the treatment effect of the law change in my DDD model - such as: raising the minimum marriage age to 18 causes a 1.22% decline in child marriage?

                    Comment


                    • #11
                      Anjali:
                      1) what you got is the predicted value for the first unit of your dataset. Your calculation matches the one Stata made, which is good habit to check for a sample of observations, especially when you have (quite complex) interactions.
                      2) your statement "raising the minimum marriage age to 18 causes a 1.22% decline in child marriage" (as per -dchildmar#postreform2-) it's true onkly if you add the cleuse "when adjusted for the remaining predictors".
                      Kind regards,
                      Carlo
                      (StataNow 18.5)

                      Comment


                      • #12
                        Ok thank you Carlo, I think I'm starting to understand it a bit more now.
                        Can I still take the DDD coefficient from the table and say that overall there's an 12 percentage point decrease in the probability of child marriage - this is what I'd seen in similar papers. Or is the 1.22% decline the only result I should be interpreting.
                        For both of these (or just the 1.22% decline if only this is correct) I am unsure about the wording of my statement - should it be the probability of child marriage or prevalence of child marriage or just child marriage?

                        Thanks again for all your help

                        Comment


                        • #13
                          Ajali:
                          I would say that, when adjusted for the other predictors, for each 1 percentage point increase in that variable the probability of child marriage decreases by 0.12x100=12.2%.
                          Kind regards,
                          Carlo
                          (StataNow 18.5)

                          Comment


                          • #14
                            My analysis is using difference-in-difference-in-differences so would that work in this case? I'm not sure what would be increasing by 1 percentage point

                            Comment


                            • #15
                              Anjali:
                              isn't it the LPM metric?
                              Kind regards,
                              Carlo
                              (StataNow 18.5)

                              Comment

                              Working...
                              X