Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining years of BRFSS data to increase sample size

    I have a question about the correct method of combining years of BRFSS data when the purpose is to increase sample size. One approach I've found is to multiply the final survey weight for each year by the proportion of that year's sample size to the total sample size for all years of data. For example, combining 2011 to 2014 data, I would calculate the new weights as follows. Of note, these are lines from the do file and not the entire file:
    1. Read-in 2011 data do file: new_weight = weight_2011*(sample_size_2011/total_sample_size),
    2. Read-in 2012 data do file: new_weight = weight_2012*(sample_size_2012/total_sample_size),
    3. Read-in 2013 data do file: new_weight = weight_2013*(sample_size_2013/total_sample_size),
    4. Read-in 2014 data do file: new_weight = weight_2014*(sample_size_2014/total_sample_size)
    5. Append 2011-2014 data do file: svyset _psu [pweight= new_weight], strata(_ststr) singleunit(certainty)
    The other approach I've found is to divide the final survey weight for each year of data by the total number of years you are combining. For example, in each read-in do file

    1. Read-in 2011 data do file: new_weight = (2011_old_weight)/4
    2. Read-in 2012 data do file: new_weight = (2012_old_weight)/4
    3. Read-in 2013 data do file: new_weight = (2013_old_weight)/4
    4. Read-in 2014 data do file: new_weight = (2014_old_weight)/4
    5. Append 2011-2014 data do file: svyset _psu [pweight= new_weight], strata(_ststr) singleunit(certainty)

  • #2
    The short answer to this is that you don't have to do anything. You want to combine multiple years of the survey. You say that you want to pool years to "increase sample size." So, the pooled data set does exactly that. Everything you suggest doing mitigates against that. Suppose there were exactly 2000 unweighted cases in each yearly file. Then when you pool you will have, obviously, 8000 cases. What you propose doing appears to get you back to the average number weighted cases in a single year. I don't think that is what you intend.

    The weighted total depends on how the weights are constructed. Suppose the weights were defined so as to get you estimates of population totals, amounting to millions of cases. When you declare the weight variable Stata will norm the weights back to the observed sample size of 8000, leaving the relative contribution of each case as before. However the weights are defined, Stata will norm back to the observed sample, correcting estimates and standard errors for weighting and sample design.

    Your motivation for your suggested corrections to the weights seems to be motivated by the fact that the yearly sample sizes differ and you don't want any one year to be overly influential. I don't know much about the BRFSS surveys, but I doubt that the sample sizes differ greatly from year to year. If you do worry about that and you are doing some sort of regression analysis, you might include year as a variable in your model.
    Richard T. Campbell
    Emeritus Professor of Biostatistics and Sociology
    University of Illinois at Chicago

    Comment

    Working...
    X