Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Workaround for "svy: xtreg , fe"

    Dear Stata users,

    I am using a survey dataset, which I declare as survey data

    Code:
     svyset clusterID [pweight=pw], strata(strataID) singleunit(centered)
    I would like to run a fixed effects model, but xtreg is not supported by svy. In addition, my weights change through time so I cannot simply add pweights to xtreg. I also have a large number of panels, so a normal regression with individual level dummies is not feasible.

    For now, this is the best approximation I've found:

    Code:
     areg y x [pweight=pw], absorb(panelID) vce(cluster clusterID)
    However, my standard errors are unexpectedly large, which may be due to the fact that I am not accounting for strata. This is an aspect of svyset that I don't really understand, even after reading the help files.

    Is there a way that I can build upon the areg specification in order to take strata into account, and best approximate a theoretical "svy: xtreg , fe" command?

    Any information would be greatly appreciated.

    Thank you in advance,
    Jack
    Last edited by Jack Jameson; 27 May 2022, 10:19.

  • #2
    -svy- estimation will differ from including weights in the presence of stratification. An alternative command to areg is:

    Code:
    ssc install reghdfe
    Code:
    webuse nhanes2f, clear
    svyset psuid [pweight=finalwgt], strata(stratid)
    svy: regress zinc age c.age#c.age weight female black orace rural i.region
    reghdfe zinc age c.age#c.age weight female black orace rural [pweight=finalwgt], absorb(region)
    You may get closer to the -svy- standard errors by clustering at the strata level.

    Code:
    reghdfe zinc age c.age#c.age weight female black orace rural [pweight=finalwgt], absorb(region) cluster(stratid)


    Res.:

    Code:
    . svyset psuid [pweight=finalwgt], strata(stratid)
    
          pweight: finalwgt
              VCE: linearized
      Single unit: missing
         Strata 1: stratid
             SU 1: psuid
            FPC 1: <zero>
    
    .
    . svy: regress zinc age c.age#c.age weight female black orace rural i.region
    (running regress on estimation sample)
    
    Survey: Linear regression
    
    Number of strata   =        31                Number of obs     =        9,189
    Number of PSUs     =        62                Population size   =  104,176,071
                                                  Design df         =           31
                                                  F(  10,     22)   =        39.57
                                                  Prob > F          =       0.0000
                                                  R-squared         =       0.0708
    
    ------------------------------------------------------------------------------
                 |             Linearized
            zinc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |  -.1656289   .0837319    -1.98   0.057    -.3364013    .0051436
                 |
     c.age#c.age |   .0008381   .0008655     0.97   0.340    -.0009271    .0026033
                 |
          weight |   .0530054   .0139419     3.80   0.001     .0245708      .08144
          female |  -6.126299   .4503075   -13.60   0.000    -7.044707   -5.207891
           black |  -2.639637   .9309377    -2.84   0.008    -4.538297   -.7409767
           orace |  -4.621847   1.866237    -2.48   0.019    -8.428063   -.8156321
           rural |   -.430827   .6720359    -0.64   0.526    -1.801453    .9397992
                 |
          region |
             MW  |   .1015875   .8366954     0.12   0.904    -1.604864    1.808039
              S  |   -.443779    .873074    -0.51   0.615    -2.224425    1.336867
              W  |   .8625159   1.560477     0.55   0.584    -2.320099    4.045131
                 |
           _cons |   92.21343   1.999375    46.12   0.000     88.13568    96.29119
    ------------------------------------------------------------------------------
    
    .
    . reghdfe zinc age c.age#c.age weight female black orace rural [pweight=finalwgt], absorb(region) cluster(stratid)
    (MWFE estimator converged in 1 iterations)
    
    HDFE Linear regression                            Number of obs   =      9,189
    Absorbing 1 HDFE group                            F(   7,     30) =     123.83
    Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                      R-squared       =     0.0708
                                                      Adj R-squared   =     0.0698
                                                      Within R-sq.    =     0.0692
    Number of clusters (stratid) =         31         Root MSE        =    14.2125
    
                                   (Std. Err. adjusted for 31 clusters in stratid)
    ------------------------------------------------------------------------------
                 |               Robust
            zinc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |  -.1656289     .08583    -1.93   0.063    -.3409172    .0096595
                 |
     c.age#c.age |   .0008381   .0009528     0.88   0.386    -.0011077    .0027839
                 |
          weight |   .0530054   .0108136     4.90   0.000     .0309211    .0750897
          female |  -6.126299   .4577537   -13.38   0.000    -7.061157   -5.191441
           black |  -2.639637   .6833685    -3.86   0.001    -4.035261   -1.244012
           orace |  -4.621847   .8036171    -5.75   0.000    -6.263052   -2.980642
           rural |   -.430827   .6572918    -0.66   0.517    -1.773196    .9115419
           _cons |   92.34905   1.908838    48.38   0.000     88.45068    96.24742
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
          region |         4           0           4     |
    -----------------------------------------------------+
    
    .
    Last edited by Andrew Musau; 27 May 2022, 10:38.

    Comment


    • #3
      Hi Andrew,

      Thanks for your suggestion. Why should I cluster at the strata level instead of PSUs? Don't svy commands cluster at the PSU level? When using my data, I find that clustering at the strata level inflates standard errors even more than clustering at the PSU level.

      Comment


      • #4
        In the absence of stratification, using -pweights- and clustering on the PSU variable is equivalent to -svy- estimation. With stratification, there is no equivalent strategy. In my example, the PSU variable has only 2 levels, so there will be significant bias if I cluster using the PSU variable.

        Comment


        • #5
          It appears that regress with the -svy- prefix still allows the -absorb()- option. So this may solve your problem.

          Code:
          webuse nhanes2f, clear
          svyset psuid [pweight=finalwgt], strata(stratid)
          svy: regress zinc age c.age#c.age weight female black orace rural i.region
          svy: regress zinc age c.age#c.age weight female black orace rural, absorb(region)
          Res.:

          Code:
          .
          . svyset psuid [pweight=finalwgt], strata(stratid)
          
                pweight: finalwgt
                    VCE: linearized
            Single unit: missing
               Strata 1: stratid
                   SU 1: psuid
                  FPC 1: <zero>
          
          .
          . svy: regress zinc age c.age#c.age weight female black orace rural i.region
          (running regress on estimation sample)
          
          Survey: Linear regression
          
          Number of strata   =        31                Number of obs     =        9,189
          Number of PSUs     =        62                Population size   =  104,176,071
                                                        Design df         =           31
                                                        F(  10,     22)   =        39.57
                                                        Prob > F          =       0.0000
                                                        R-squared         =       0.0708
          
          ------------------------------------------------------------------------------
                       |             Linearized
                  zinc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |  -.1656289   .0837319    -1.98   0.057    -.3364013    .0051436
                       |
           c.age#c.age |   .0008381   .0008655     0.97   0.340    -.0009271    .0026033
                       |
                weight |   .0530054   .0139419     3.80   0.001     .0245708      .08144
                female |  -6.126299   .4503075   -13.60   0.000    -7.044707   -5.207891
                 black |  -2.639637   .9309377    -2.84   0.008    -4.538297   -.7409767
                 orace |  -4.621847   1.866237    -2.48   0.019    -8.428063   -.8156321
                 rural |   -.430827   .6720359    -0.64   0.526    -1.801453    .9397992
                       |
                region |
                   MW  |   .1015875   .8366954     0.12   0.904    -1.604864    1.808039
                    S  |   -.443779    .873074    -0.51   0.615    -2.224425    1.336867
                    W  |   .8625159   1.560477     0.55   0.584    -2.320099    4.045131
                       |
                 _cons |   92.21343   1.999375    46.12   0.000     88.13568    96.29119
          ------------------------------------------------------------------------------
          
          .
          . svy: regress zinc age c.age#c.age weight female black orace rural, absorb(region)
          (running regress on estimation sample)
          
          Survey: Linear regression
          
          Number of strata   =        31                Number of obs     =        9,189
          Number of PSUs     =        62                Population size   =  104,176,071
                                                        Design df         =           31
                                                        F(   7,     25)   =        63.23
                                                        Prob > F          =       0.0000
                                                        R-squared         =       0.0708
          
          ------------------------------------------------------------------------------
                       |             Linearized
                  zinc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   age |  -.1656289   .0845041    -1.96   0.059    -.3379761    .0067184
                       |
           c.age#c.age |   .0008381   .0008667     0.97   0.341    -.0009295    .0026057
                       |
                weight |   .0530054   .0139078     3.81   0.001     .0246402    .0813706
                female |  -6.126299   .4403543   -13.91   0.000    -7.024408   -5.228191
                 black |  -2.639637   1.121315    -2.35   0.025    -4.926574   -.3526995
                 orace |  -4.621847   1.557779    -2.97   0.006    -7.798959   -1.444736
                 rural |   -.430827   .6325575    -0.68   0.501    -1.720937    .8592825
                 _cons |   92.34905   2.235011    41.32   0.000     87.79071    96.90738
          ------------------------------------------------------------------------------
          Last edited by Andrew Musau; 30 May 2022, 06:31.

          Comment


          • #6
            Thanks again Andrew, this is really helpful. I had no idea that this option existed for -reg-. Is it written anywhere in the help files? I can't seem to find it...

            This seems to massively inflate my SEs, not sure why this is could be happening
            Last edited by Jack Jameson; 30 May 2022, 08:02.

            Comment


            • #7
              The -absorb()- option is undocumented, but it essentially turns regress into areg. I do not think that your standard errors change much compared to the case where you include the fixed effects as indicators. There is a small adjustment due to different degrees of freedom from explicitly including indicators and absorbing them. Try to do a comparison with a small subset of your data.

              Comment

              Working...
              X