Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Raking Weights on Nested Data: How to Rake by Group

    Introduction
    I have multilevel survey data of teachers nested in schools. I have manually calculated design weights and non-response adjustment weights based on probability selection and response rate. Now I want to create post-stratification weights to compensate for non-coverage, mainly by raking on two marginals: the sex of (male or female) and the employment status (full-time or not full-time) of the teacher. I have tried doing this in Stata using the user-written module survwgt; however, I can't get it to work on nested data.

    Sample Data
    **Variables**
    school : unique school id
    male : 1 = male teacher
    fulltime : 1= full-time teacher
    Nall : true population total of teachers, per school
    nall : number of teachers in the sample, per school
    Nmale : true population total of male teachers, per school
    nmale : number of male teachers in the sample, per school
    Nfull : true population total of full-time teachers, per school
    nfull : number of full-time teachers in the sample, per school
    rr : response rate of teachers, per school (used to calculate oldwt)
    oldwt : the product of the design weight and the non-response adjustment
    newwt : the new weight, to be produced via raking
    school male fulltime Nall nall Nmale nmale Nfull nfull rr oldwt
    1 1 1 9 5 4 3 7 3 .56 1.8
    1 1 1 9 5 4 3 7 3 .56 1.8
    1 1 0 9 5 4 3 7 3 .56 1.8
    1 0 1 9 5 4 3 7 3 .56 1.8
    1 0 0 9 5 4 3 7 3 .56 1.8
    2 1 1 8 6 6 4 5 4 .75 1.3
    2 1 1 8 6 6 4 5 4 .75 1.3
    2 1 0 8 6 6 4 5 4 .75 1.3
    2 1 1 8 6 6 4 5 4 .75 1.3
    2 0 1 8 6 6 4 5 4 .75 1.3
    2 0 0 8 6 6 4 5 4 .75 1.3

    Failed Attempts
    1. I first tried
    Code:
      survwgt rake oldwt, by(nmale nfull) totvars(Nmale Nfull) generate(newwt)
    This produced the following error, presumably because the data are nested in schools:
    Control total Nmale not constant within categories of dimension nmale
    1. So I next tried specifying the data as grouped:
    Code:
      bysort school: survwgt rake oldwt, by(nmale nfull) totvars(Nmale Nfull) generate(newwt)
    This produced the following error, presumably because the data are nested in schools:
    survwgt may not be combined with by
    How can I rake margins for observations that are nested in groups?

  • #2
    In the future, please, as asked in the FAQ, give the source of contributed commands. The survwgt package was written by Nick Winter, and is available via ssc. A a nice guide to raking is Battaglia, 2013. Note that the "by()" option should specify the categories by which you want to reweight, in this case school-gender.

    Create a small data set with one line per school and gender with variables school, gender, ngender (totals). Then add a single variable scgender to identify the school-gender combinations:

    Code:
    gen scgender = group(school gender)
    survwgt rake oldwt, by(scgender) totvar(ngender) generate(newwt)
    merge 1:m school gender using(teacherfile)
    This is actually a post-stratification technique. It won't remove response bias except that related to gender. To do a better job, you'd need information on characteristics of responders and non-responders. The survwgt nonresponse module can do this. For other approaches, see Groves et al. (2009) , p. 350 or Lohr, 2009, Chapter 8. I personally would use logistic regression to get an estimated probability of response for each person, then weight by the inverse.

    References:

    Battaglia, M. P., Hoaglin, D. C., & Frankel, M. R. (2013). Practical considerations in raking survey data. Survey Practice, 2(5).
    available at: http://www.surveypractice.org/index.php/SurveyPractice/article/view/176/0

    Groves, Robert M., Floyd J. Fowler, Mick P. Couper, James M. Lepkowski, Eleanor Singer, and Roger Tourangeau. 2009. Survey methodology, Second Edition. Hoboken, N.J.: Wiley.

    Lohr, Sharon L. 2009. Sampling: Design and Analysis. Boston, MA: Cengage Brooks/Cole.
    Last edited by Steve Samuels; 29 Jun 2015, 17:07.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Correction: I stated that one should run survwgt on a small external data set. That was wrong, as that data set would not include individual weights. Your main file already contains the information needed to create the totals. Below I include the full-time/part-time category which I originally omitted.


      Code:
      gen gender = male
      label define gender 1 "Male" 0 "Female"
      label values gender gender
      
      gen ngender = Nmale if male
      replace ngender = Nall-Nmale if !male
      gen scgender = group(school scgender)
      
      gen timecat = fulltime
      label define timecat 1 "Full-time" 0 "Part-time"
      label values timecat timecat
      
      gen ntime = Nfull if fulltime
      replace ntime = Nall -Nfull if !fulltime
      gen sctime = group(school timecat)
      
      survwgt rake oldwt, by(scgender sctime) totvar(ngender ntime) generate(newwt)
      Last edited by Steve Samuels; 30 Jun 2015, 00:15.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Thank you so much for the response. I'm trying to work the code through my data but I am having a problem with these parts:
        Code:
        gen scgender = group(school scgender)
        gen sctime = group(school timecat)
        It returns the following error, suggesting it things the two separate variables (e.g. school and scgender) are supposed to be one variable? Is there an extra user-written module I need to have installed?
        . gen sctime = group(school timecat)
        schooltimecat not found
        r(111);
        I'm using Stata 13.1.
        Last edited by Michael West; 30 Jun 2015, 10:29. Reason: Added Stata version

        Comment


        • #5
          In the last command line you are using generate when you should be using egen.

          There is an undocumented group() function that will work with generate given a single variable, but it chokes on your input. Nevertheless it is not what you want at all.

          (This was Steve Samuels' typo, I surmise.)

          In addition, use the label option too:

          Code:
          egen scgender = group(school scgender), label
          egen sctime = group(school timecat), label

          Comment


          • #6
            Thanks for catching the typo, Nick. I had no data set handy and didn't try the code.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Thanks Steve and Nick for the assistance. I have tested the code on the sample data on my machine and everything looks like it has worked out! I believe it has worked because I calculated the sum of weights and they align with the population totals. There was one more typo that I caught--the second scgender in this line should be just gender:
              *egen scgender = group(school scgender), label --> egen scgender = group(school gender), label
              So the final code, for reference, is as follows (note that I had to change the Nx variable names to n_x to get the code to run in Stata--e.g. Nmale in the original post becomes n_male below):
              Code:
              gen gender = male
              label define gender 1 "Male" 0 "Female"
              label values gender gender
              
              gen ngender = n_male if male
              replace ngender = n_all - n_male if !male
              egen scgender = group(school gender), label
              
              gen timecat = fulltime
              label define timecat 1 "Full-time" 0 "Part-time"
              label values timecat timecat
              
              gen ntime = n_full if fulltime
              replace ntime = n_all - n_full if !fulltime
              egen sctime = group(school timecat), label
              
              survwgt rake oldwt, by(scgender sctime) totvar(ngender ntime) generate(newwt)

              Comment


              • #8
                I'm happy hear it's working now. Checking by hand is always a good idea.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment

                Working...
                X