Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Different survey means in local

    I am trying to get survey means for some demographic variables based on three groups: incar == 0, incar == 1, and both together (aka non-missings). In my actual code, it looks something like this after -svyset-

    Code:
    local demog age gender white
    svy: mean `demog' if incar != .
    estat sd
    estadd mat sd = r(sd)
    eststo
    
    svy: mean `demog' if incar == 0
    estat sd
    estadd mat sd = r(sd)
    eststo
    
    svy: mean `demog' if incar == 1
    estat sd
    estadd mat sd = r(sd)
    eststo
    This lets me produce a nice little table with esttab (from SSC). However, the problem I'm running into is that it seems like the means are slightly different when I run them all together as part of a local, compared to if I ran them individually. The sample sizes are slightly different as well, with lower sample sizes when I run the local.

    I assume this is because of something happening with the local? Does it only run for all non-missing observations in the local, or something like that? And I guess the obvious main question is if there's a way to avoid this?

    I tried to recreate the problem with a publicly available dataset but wasn't able to (maybe the missings in loglead aren't distributed in a way that throws it off?)...here's the start of my code to do that if it's useful:

    Code:
    clear
    webuse nhanes2f
    
    svyset psuid [pweight=finalwgt], strata(stratid)
    
    svy: mean age if loglead != .
    
    local example sex age height weight
    svy: mean `example' if loglead != .
    Last edited by Garrett Todd; 12 Aug 2022, 12:54.

  • #2
    Does it only run for all non-missing observations in the local, or something like that?
    That's exactly what's happening. And it has nothing to do with the variable names being specified in a local. If you just did
    Code:
    svy: mean age gender white // ...
    that would also produce different results from doing them separately if there are missing values getting in the way. -mean- is an estimation command, and in Stata all estimation commands restrict the estimation sample to those observations with no missing values in any variable mentioned in the command.

    There is no option to have -mean- override this and do pairwise calculations. You just have to do each variable separately if that is what you need.

    Comment


    • #3
      I see! That makes a lot of sense. Thanks Clyde Schechter. If you have any thoughts on ways to try to efficiently do that (maybe in a loop?) and be able to put all the means and SDs into a table at the end (I actually have about 10 variables instead of the 3 I showed above), I'm all ears! If I do something like this:

      Code:
      svy: mean age if incar != .
      eststo
      svy: mean gender if incar != .
      eststo
      svy: mean white if incar !=.
      eststo
      esttab using example.csv
      It's not too bad--the resulting csv file basically has a big diagonal of means, which I could always copy and paste or put back into Stata to clean up a bit. But something without that extra step would be nice.
      Last edited by Garrett Todd; 12 Aug 2022, 13:08.

      Comment


      • #4
        So something like this:
        Code:
        clear*
        webuse nhanes2f
        
        svyset psuid [pweight=finalwgt], strata(stratid)
        
        foreach v of varlist sex age height weight {
            svy, subpop(if !missing(loglead)): mean `v'
            estat sd
            estadd mat sd = r(sd)
            eststo
        }
        Notes:

        1. The use of -if- clauses in -svy:- estimations will generally lead to incorrect results. Conditioning on a subset should be done using the -subpop()- option to -svy:- itself, as illustrated above. The application of survey estimation adjustments is not done correctly with ordinary -if- clauses. Also, if you wish to stratify over values of a variable like incar, that should be done by using the -over()- option to means:
        Code:
        svy, subpop(if !missing(loglead)): mean `v', over(incar)
        (except that there is no incar variable in nhanes2f. This is just to illustrate the technique.)

        2. I am not a user of -estadd-, -estout-, etc., so I do not know if these changes to the basic code are compatible with your use of those commands.

        Comment


        • #5
          Thanks Clyde Schechter, very helpful!

          Comment

          Working...
          X