Using tabstat with by and stat

Chris Islam

Join Date: Jul 2016

Posts: 3
#1

Using tabstat with by and stat

21 Jul 2016, 08:48

Hello dear community,

I am really new to Stata. For my work I have to deal with the SOEP, a survey data set from Germany. I stumpled over an interesting fact that I can't explain and which gives me some strange values.
My data set is person based. Among others every person has a variable "income" and a weight. The ladder is an analytic weight. Now I want to calculate the deciles and their means and shares. For that I am using xtile the following way:

Code:

xtile xct = income [aweight=weight], nquantiles(10)

Now, interestingly enough the fact, that the observations in each decile are not equal to one another. But there is more. When I use

Code:

tabstat income [aweight=weight], stat(n mean sum max) by(xct) save

I observed that the column sum might be wrong. Because when I multiply mean*n I should get sum, but I do not.
My suggestion is, that there is something strange going on with the weights. Because when I delete them, each decile has equal observations. But unfortunately I need to use those weights and therefore can not work without them. I read the help file for xtile and they mentioned that it is possible to use weights. But now I face the fact that using them, something goes wrong.

Do you have any suggestions or hints?
I would be glad about some comments and answers since I am struggleing with this problem for quite a while.

Thanks in advance
Tags: soep, tabstat, xtile
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

21 Jul 2016, 10:38

When calculate deciles (or any other quantiles) you can never be assured that each group will have the same number of observations because there can be issues of tied values that must always go to the same group. (Also sometimes the total sample size is not a multiple of the number of quantiles, also forcing some inequality.)

More specifically here, when you throw in the aweights, you are, in effect, changing observations from being n = 1 to being n = weight. So even if things work out evenly without weights, with weights the number of observations at each level of income is changing, and a value that is, for example, weighted 2 must assign "both" its instantiations to the same quantile group.

With regard to the statistics, I think you will find that the Sum = (Total of Weights)* Mean in the weighted analysis. N*Mean = Sum is not true in the presence of aweights.
Comment
Chris Islam

Join Date: Jul 2016

Posts: 3
#3

22 Jul 2016, 03:30

Hello Clyde,

thanks for your answer.

Yes, I know this fact, that the quantiles are not 100 % exact. But my deviations are very large (until 300 in a dataset with roughly 20.000 observations per year). I tested it and there not that many observations on the intervall marks.

Your commentary concering the aweights are in my opinion the exact definition of fweights, am I wrong? Because the ladder declares that one observation can occur several times.

But nevertheless, your last hint was true. Sum = (Total of Weights)* Mean in the weighted analysis. Thanks, now I now at least with which formular to work and with which not. It helped a lot!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

22 Jul 2016, 07:49

Your commentary concering the aweights are in my opinion the exact definition of fweights, am I wrong? Because the ladder declares that one observation can occur several times.

Yes, it appears that -tabstat- treats aweights as if they were fweights, except perhaps for rescaling. But if you think about the definition of aweights, that also makes sense in this context. The concept of aweights is that the observations represent mean values drawn from a sample whose size equals the aweight. So for statistics like the sum and the mean, aweights and fweights would be equivalent. Where they would differ, would be in things like the standard errors in a regression analysis.
Comment
Chris Islam

Join Date: Jul 2016

Posts: 3
#5

26 Jul 2016, 02:17

When I think about it, you are right. In the end I also called the official analyist of the SOEP and they confirmed your suggestions. Thank you, Clyde. You helped me a lot. Indeed, now I get the right results.
Comment

Announcement

Using tabstat with by and stat

Comment

Comment

Comment

Comment