Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coefficients when removing fixed effects by de-meaning are different to when I include fixed effects dummies

    Dear all

    I'm working with three level nested data:

    Individuals (childnum6), nested within...
    Nuclear families (linked by nfamid), nested within...
    Extended families (linked by efamid)

    I'm trying to remove extended family fixed effects manually by de-meaning my variables with respect to extended family averages. Using a sample of my data to test my code, I notice that the coefficient I get when including extended family dummies as fixed effects is different to when I de-mean the variables. If anyone could tell me where I'm going wrong I would be very grateful (code and sample data below).

    P.S. not sure if it's at all relevant but chmarried6, chage6, childnum6 are at the individual level, whereas chtotal6 and feduc are at the nuclear family level.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long efamid float(nfamid chmarried6 chtotal6) byte childnum6 float(chage6 feduc)
    1200002  4 0 3 1 45 13
    1200002  4 . 3 2 42 13
    1200002  4 1 3 3 38 13
    1200004  7 1 2 1 42 20
    1200004  7 1 2 2 37 20
    1200005  8 1 2 1 49 14
    1200005  8 0 2 2 48 14
    1200007 12 1 3 1 38 16
    1200007 12 . 3 2 35 16
    1200007 12 1 3 3  . 16
    1200007 13 1 3 1 46 17
    1200007 13 1 3 2 43 17
    1200007 13 1 3 3 42 17
    1200009 15 1 9 1 50 13
    1200009 15 1 9 2 48 13
    1200009 15 1 9 3 47 13
    1200009 15 1 9 4 46 13
    1200009 15 1 9 5 45 13
    1200009 15 1 9 6 44 13
    1200009 15 1 9 7 44 13
    1200009 15 1 9 8 41 13
    1200009 15 1 9 9 33 13
    1200009 16 1 3 1 60 14
    1200009 16 1 3 2 53 14
    1200009 16 1 3 3 46 14
    1200010 18 0 2 1 34 17
    1200010 18 1 2 2 30 17
    1200011 19 1 2 1 45 12
    1200011 19 1 2 2 43 12
    1200012 21 1 4 1 47 12
    1200012 21 1 4 2 45 12
    1200012 21 0 4 3 43 12
    1200012 21 1 4 4 38 12
    1200013 23 0 3 1 44 17
    1200013 23 1 3 2 42 17
    1200013 23 1 3 3 40 17
    1200015 28 0 2 1 42 12
    1200015 28 0 2 2 39 12
    1200016 29 1 3 1 52 13
    1200016 29 0 3 2 50 13
    1200016 29 0 3 3 48 13
    1200017 31 0 2 1 30 14
    1200017 31 0 2 2 30 14
    1200017 32 1 3 1 48 12
    1200017 32 1 3 2 47 12
    1200017 32 1 3 3 36 12
    1200018 33 0 3 1 49 12
    1200018 33 0 3 2  . 12
    1200018 33 1 3 3  . 12
    1200019 35 1 4 1 44 18
    1200019 35 1 4 2 42 18
    1200019 35 0 4 3 41 18
    1200019 35 1 4 4 38 18
    1200019 36 1 2 1 31 16
    1200019 36 0 2 2 27 16
    1200020 37 . 2 1  . 22
    1200020 37 . 2 2  . 22
    1200021 39 0 2 1 48 15
    1200021 39 0 2 2 46 15
    1200021 40 1 4 1 52 15
    1200021 40 1 4 2 50 15
    1200021 40 1 4 3 46 15
    1200021 40 1 4 4 42 15
    1200024 45 . 2 1 41 16
    1200024 45 1 2 2 38 16
    1200028 52 1 2 1 44 12
    1200028 52 1 2 2 42 12
    1200028 53 1 2 1 36 12
    1200028 53 1 2 2 32 12
    1200029 54 0 4 1 49 16
    1200029 54 1 4 2 46 16
    1200029 54 1 4 3 43 16
    1200029 54 1 4 4 39 16
    1200032 60 0 7 1 56 16
    1200032 60 1 7 2 54 16
    1200032 60 1 7 3 52 16
    1200032 60 1 7 4 50 16
    1200032 60 1 7 5 48 16
    1200032 60 . 7 6 45 16
    1200032 60 1 7 7 36 16
    1200033 61 0 3 1 49 14
    1200033 61 1 3 2 45 14
    1200033 61 . 3 3 21 14
    1200034 64 0 6 1 49 16
    1200034 64 . 6 2 49 16
    1200034 64 . 6 3 48 16
    1200034 64 1 6 4 47 16
    1200034 64 1 6 5 46 16
    1200034 64 1 6 6 43 16
    1200036 67 1 2 1 50 12
    1200036 67 1 2 2 48 12
    1200036 68 0 7 1 56 12
    1200036 68 1 7 2 53 12
    1200036 68 . 7 3 51 12
    1200036 68 . 7 4 47 12
    1200036 68 1 7 5 43 12
    1200036 68 0 7 6 41 12
    1200036 68 1 7 7 39 12
    1200037 69 1 2 1 39 17
    1200037 69 0 2 2 37 17
    end
    Code:
    *Generating de-meaned variables 
    bysort efamid: egen mean_chmarried6 = mean(chmarried6)
    gen d_chmarried6 = chmarried6 - mean_chmarried6
    
    bysort efamid: egen mean_chtotal6 = mean(chtotal6)
    gen d_chtotal6 = chtotal6 - mean_chtotal6
    
    bysort efamid: egen mean_childnum6 = mean(childnum6)
    gen d_childnum6 = childnum6 - mean_childnum6
    
    bysort efamid: egen mean_chage6 = mean(chage6)
    gen d_chage6 = chage6 - mean_chage6
    
    bysort efamid: egen mean_feduc = mean(feduc)
    gen d_feduc = feduc - mean_feduc
    
    *Running regression with extended family dummies
    reg chmarried6 chtotal6 childnum6 chage6 feduc i.efamid, cluster(nfamid)
    
    *Running regression with de-meaned variables 
    reg d_chmarried6 d_chtotal6 d_childnum6 d_chage6 d_feduc, cluster(nfamid)

  • #2
    The problem is that you are calculating your means over the entire data set, whereas, because of missing values in variables, you need to calculate your means only over those observations that will be part of the estimation sample. Specifically, you need to restrict the mean calculation to those observations with no missing values on any of the regression variables (including nfamid). When you do that, the two approaches give the same results, except, of course, the constant term:

    Code:
    . *Generating de-meaned variables
    . mark complete_data
    
    . markout  complete_data chmarried6 chtotal6 childnum6 chage6 feduc efamid nfamid
    
    . bysort efamid: egen mean_chmarried6 = mean(chmarried6) if complete_data
    (14 missing values generated)
    
    . gen d_chmarried6 = chmarried6 - mean_chmarried6
    (14 missing values generated)
    
    .
    . bysort efamid: egen mean_chtotal6 = mean(chtotal6) if complete_data
    (14 missing values generated)
    
    . gen d_chtotal6 = chtotal6 - mean_chtotal6
    (14 missing values generated)
    
    .
    . bysort efamid: egen mean_childnum6 = mean(childnum6) if complete_data
    (14 missing values generated)
    
    . gen d_childnum6 = childnum6 - mean_childnum6
    (14 missing values generated)
    
    .
    . bysort efamid: egen mean_chage6 = mean(chage6) if complete_data
    (14 missing values generated)
    
    . gen d_chage6 = chage6 - mean_chage6
    (14 missing values generated)
    
    .
    . bysort efamid: egen mean_feduc = mean(feduc) if complete_data
    (14 missing values generated)
    
    . gen d_feduc = feduc - mean_feduc
    (14 missing values generated)
    
    .
    . *Running regression with extended family dummies
    . reg chmarried6 chtotal6 childnum6 chage6 feduc i.efamid, cluster(nfamid)
    
    Linear regression                               Number of obs     =         86
                                                    F(10, 29)         =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.3461
                                                    Root MSE          =     .43215
    
                                    (Std. err. adjusted for 30 clusters in nfamid)
    ------------------------------------------------------------------------------
                 |               Robust
      chmarried6 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
        chtotal6 |   -.039093   .0402273    -0.97   0.339    -.1213671    .0431812
       childnum6 |   .1072145   .0549444     1.95   0.061    -.0051593    .2195884
          chage6 |   .0248873   .0101971     2.44   0.021     .0040319    .0457428
           feduc |  -.1716513   .0813078    -2.11   0.043    -.3379444   -.0053582
                 |
          efamid |
        1200004  |   1.765848    .561417     3.15   0.004     .6176214    2.914075
        1200005  |   .0119544   .0778803     0.15   0.879    -.1473287    .1712374
        1200007  |    1.15183   .3086465     3.73   0.001     .5205776    1.783083
        1200009  |   .3552358   .1867345     1.90   0.067    -.0266791    .7371506
        1200010  |    .937549   .3518482     2.66   0.012     .2179386    1.657159
        1200011  |   .2806447   .1046682     2.68   0.012     .0665742    .4947152
        1200012  |   .0202816     .08333     0.24   0.809    -.1501474    .1907106
        1200013  |   .8408282   .3241372     2.59   0.015     .1778932    1.503763
        1200015  |  -.6322497   .1004706    -6.29   0.000     -.837735   -.4267643
        1200016  |  -.3782089   .0866755    -4.36   0.000    -.5554802   -.2009375
        1200017  |   .1536036   .1291472     1.19   0.244     -.110532    .4177392
        1200018  |  -.7510916    .099508    -7.55   0.000    -.9546084   -.5475749
        1200019  |   1.013496   .3936686     2.57   0.015     .2083532    1.818639
        1200021  |   .3599551   .4159035     0.87   0.394     -.490663    1.210573
        1200024  |   1.062967   .2352579     4.52   0.000       .58181    1.544123
        1200028  |    .417525   .1417601     2.95   0.006      .127593    .7074569
        1200029  |   .6819994   .2570365     2.65   0.013     .1563008    1.207698
        1200032  |   .6310175   .3348647     1.88   0.070    -.0538578    1.315893
        1200033  |   .0883783   .0867282     1.02   0.317    -.0890007    .2657573
        1200034  |   .5495889   .3028747     1.81   0.080    -.0698594    1.169037
        1200036  |    -.15044   .2156619    -0.70   0.491    -.5915182    .2906382
        1200037  |   .7882251    .324317     2.43   0.021     .1249224    1.451528
                 |
           _cons |   1.601493   1.111182     1.44   0.160    -.6711284    3.874114
    ------------------------------------------------------------------------------
    
    .
    . *Running regression with de-meaned variables
    . reg d_chmarried6 d_chtotal6 d_childnum6 d_chage6 d_feduc, cluster(nfamid)
    
    Linear regression                               Number of obs     =         86
                                                    F(4, 29)          =       2.98
                                                    Prob > F          =     0.0354
                                                    R-squared         =     0.0977
                                                    Root MSE          =     .36882
    
                                    (Std. err. adjusted for 30 clusters in nfamid)
    ------------------------------------------------------------------------------
                 |               Robust
    d_chmarried6 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      d_chtotal6 |   -.039093   .0343325    -1.14   0.264    -.1093107    .0311248
     d_childnum6 |   .1072145   .0468929     2.29   0.030     .0113079    .2031212
        d_chage6 |   .0248873   .0087028     2.86   0.008      .007088    .0426866
         d_feduc |  -.1716513    .069393    -2.47   0.019    -.3135759   -.0297267
           _cons |  -1.11e-08   .0261093    -0.00   1.000    -.0533996    .0533995
    ------------------------------------------------------------------------------
    
    .

    Comment


    • #3
      Thank you, Clyde! Works perfectly.

      Best Wishes
      Owen

      Comment

      Working...
      X