A student is pooling several years of GSS (General Social Survey) data. She sent me the following question (which I suppose might apply equally well to many situations where you have successive cross-sections of data).
On the one hand, the advice to treat year as a stratum variable sounds reasonable; on the other hand I don't remember seeing similar advice anywhere else. I have a suspicion it won't matter much either way, but I wonder if there is any consensus or controversy over whether or not to do this.
Do you have an opinion on treating Year as a stratum variable with pooled data? I ask because Donald Treiman recommends this in his book Quantitative Data Analysis. He writes that "it is reasonable to treat Year as the stratum variable because the surveys from each year are independent, and Year is a fixed variable." His code to set up pooled GSS data is: svyset sampcode [pweight=weight], strata(year). This code is similar to the UCLA code you sent me, with the addition of year (see http://www.ats.ucla.edu/stat/stata/f...setups.htm#GSS). I haven't seen this approach recommended before, and am not sure if this is the best route to take. My current analysis uses svyset without year as a stratum variable. Instead, I include dummies for Year in my models.
Comment