Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • using stata svy subpopulation option for running regression


    My subpopulation is asian females when I add additional covariates to the model the number of observations in the subpopulation changes. Is there a way to create an indicator for the analytic sample that has complete observations for all the variables in the full specification when running regression with svy subpopulation option?

    gen asiansubpop1 = (nh_asian == 1 & female == 1)

    svy, subpop(asiansubpop1==1): regress depressvar gen2 gen3 age

    svy, subpop(asiansubpop1==1): regress depressvar gen2 gen3 age educ bio1_step1 bio1 oth_fam

    The number of observations for the subpopulation changes when I add additional covariates to the model. Is there a way to keep the analytical sample consistent across models?
    Last edited by Radhika Prasad; 08 Mar 2024, 21:01.

  • #2
    The simplest way to do it is to start with the regression that has all of the variables in it. Then for each subsequent regression, add -if e(sample)- to the regression command. Like this example:

    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . count
      74
    
    .
    . regress price mpg i.foreign i.rep78
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(6, 62)        =      3.94
           Model |   159087839         6  26514639.9   Prob > F        =    0.0021
        Residual |   417709119        62  6737243.86   R-squared       =    0.2758
    -------------+----------------------------------   Adj R-squared   =    0.2057
           Total |   576796959        68  8482308.22   Root MSE        =    2595.6
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -299.6068   63.34525    -4.73   0.000    -426.2322   -172.9815
                 |
         foreign |
        Foreign  |   1102.334   901.7772     1.22   0.226    -700.2928    2904.961
                 |
           rep78 |
              2  |   841.3622   2055.452     0.41   0.684    -3267.428    4950.153
              3  |   1285.116   1901.486     0.68   0.502    -2515.901    5086.132
              4  |   1155.571   1984.561     0.58   0.562     -2811.51    5122.652
              5  |   2353.179   2130.577     1.10   0.274    -1905.784    6612.142
                 |
           _cons |   10856.24   2266.757     4.79   0.000      6325.06    15387.43
    ------------------------------------------------------------------------------
    
    .
    . regress price mpg if e(sample)
    
          Source |       SS           df       MS      Number of obs   =        69
    -------------+----------------------------------   F(1, 67)        =     17.58
           Model |   119910002         1   119910002   Prob > F        =    0.0001
        Residual |   456886957        67  6819208.31   R-squared       =    0.2079
    -------------+----------------------------------   Adj R-squared   =    0.1961
           Total |   576796959        68  8482308.22   Root MSE        =    2611.4
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -226.3607   53.98091    -4.19   0.000     -334.107   -118.6143
           _cons |   10965.23   1191.468     9.20   0.000      8587.05    13343.41
    ------------------------------------------------------------------------------
    
    .
    . //      AS OPPOSED TO
    . regress price mpg
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(1, 72)        =     20.26
           Model |   139449474         1   139449474   Prob > F        =    0.0000
        Residual |   495615923        72  6883554.48   R-squared       =    0.2196
    -------------+----------------------------------   Adj R-squared   =    0.2087
           Total |   635065396        73  8699525.97   Root MSE        =    2623.7
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
           _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
    ------------------------------------------------------------------------------

    Comment


    • #3
      slight change to Clyde's answer: if you will be estimating more than one model, or anything comes between the estimation of the various models, then what is in "e(sample" will change; so estimate with all variables and make a new variable that is equal to e(sample) but will not change; e.g.,
      Code:
      gen byte insample=e(sample)
      and then use "if insample" in your commands

      Comment


      • #4
        Cross-posted at https://stackoverflow.com/questions/...ing-regression Please tell us about cross-posting, as requested in the FAQ

        Comment


        • #5
          Thank you. This is very helpful.

          Comment

          Working...
          X