Assistance with weighted survey data

Jake Mooney

Join Date: Aug 2023

Posts: 2
#1

Assistance with weighted survey data

15 Aug 2023, 13:34

First, a mea culpa - I'm a mechanical engineer turned physician with a research niche in medical data sciences. I've picked up C, Java, Matlab and python along the way, but never had to use STATA until today. I promise I''m trying to learn, but I have a very specific need and would really appreciate a spring board to build off of, and I imagine this would be very easy for someone familiar with STATA.

I am recently working with the NHAMCS dataset. I've done all my data extraction/analysis in matlab thus far (familiar, convenient, more advanced higher math/machine learning tools). As a final step to result output I need to generate mean and CI for the data. Because of the NHAMCS study design, I'm forced into using a statistical software (like STATA) to make use of the ultimate cluster design option. Per NHAMCS documentation:

The pweight (PATWT), strata (CSTRATM), and PSU (CPSUM) are set with the svyset command as follows:
Stata 8:
svyset [pweight=patwt], psu(cpsum) strata(cstratm)
Stata 9 and later:
svyset cpsum [pweight=patwt], strata(cstratm)

I can create any data array to pass into STATA (variable array of interest i.e. age, PATWT, CSTRATM, CPSUM, whatever else is needed). So what is the bare minimum of code I would need to generate an output of the mean and 95% confidence interval for a given variable array?

Thank you all so much in advance
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#2

15 Aug 2023, 13:42

what you want is not completely clear to me but start with

Code:

help svy_estimation

my guess is that you want the first command listed there, "means" (I am assuming/guessing here that by "variable array" you mean what Stata calls a variable (or possibly more than one)

note that many commands allow the use of pweights as part of the command and that gives you another route
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#3

15 Aug 2023, 13:59

Code:

svyset cpsum [pweight = patwt], strata(cstratm) svy: mean variable_list

Replace variable_list by the name(s) of the variable(s) whose means (with CIs) you are interested in.

Note that if you specify more than one variable in your variable list, the means will be calculated on the subset of the data for which none of those specified variables has a missing value. If you wish to avoid that, then you must do the variables one at a time:

Code:

svyset cpsum [pweight = patwt], strata(cstratm) foreach v of varlist variable_list { svy: mean `v' }

Done this way, each variable's mean will be calculated on all non-missing values of that variable, regardless of the missingness of any other variable.

In the future, please avoid using abbreviations like NHAMCS. Either spell it out on first use, or, in this case you could have just not mentioned the name of the survey as it has no real bearing on your question. As a US-based epidemiologist, I know what the National Hospital Ambulatory Medical Care Survey is. But this is an international multi-disciplinary forum, and I'm sure that the majority of our members will not be familiar with it. We ask that specialized abbreviations and technical jargon be avoided. The only common knowledge that should be assumed here is some basic statistics and some, possibly minimal, familiarity with Stata. Other than that, please use only language and abbreviations that would be understood by any college-educated English-speaking adult anywhere in the world.

Also, to improve your chances of getting a timely and helpful response when asking for code, it is best to show example data in your post. In your case, the description of the data set you gave was sufficient for answering your question. But, in general, even the best, most careful description, proves insufficient. Omitting example data will impair the value you get from using Statalist because it will a) discourage some people from responding at all, or b) cause somebody to give a response based on guesswork about the aspects of your data that are important but not discernible in your post--and that response is likely to fail in your actual data, c) or cause somebody to post back saying that you need to show example data in order to get help. All of these things delay your getting on with your project, though they vary in how much of your time and that of others will be wasted. Posting example data with the -dataex- command is quick and easy. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Added: Crossed with #2.

Last edited by Clyde Schechter; 15 Aug 2023, 14:02.
1 like
Comment
Jake Mooney

Join Date: Aug 2023

Posts: 2
#4

15 Aug 2023, 16:14

Rich, apologies for the lack of clarity.

Clyde, I really appreciate the quick and thorough response, as well as the feedback on how to improve posting on the forum in the future.
Comment

Announcement

Assistance with weighted survey data

Comment

Comment

Comment