question about cross tabulation

Rio Ferdinand

Join Date: Oct 2022

Posts: 10
#1

question about cross tabulation

26 Oct 2022, 08:30

Hi, everyone.

It's the first time I ask question on this forum which did help me a lot in the past few years.

I have 14 household surveys in 14 countries. Each survey was conducted in different years and there is a household weight variable in each dataset. Now I merged them and tried to cross tabulate the country and gender_urbanrural (four types of value: male_rural, female_rural, male_urban, female_urban) variable with weights (tab country gender [aw=hhweight], m) . But I found that such a cross-tabulation would create weird number for some of the countries. For example, if I add one if condition by the end of the tab (tab country gender [aw=hhweight] if abc==1, m), some country's row total would be greater than its row total without the condition. But in the dataset, a condition would give a smaller subsample. If I don't add the weight (tab country gender, m), there is no such a problem. So I wonder if there is any way for me to compare all countries with weight.

Thanks.
Tags: categorical, sample, strata

William Lisowski

Join Date: Dec 2014
Posts: 10150

26 Oct 2022, 09:45

Analytic weights are rescaled to sum to the number of observations in the data being tabulated. The rescaling when you exclude observations will affect all the counts and margins, and there is no reason to expect that the margins will be reduced. Consider the following example, where observations with green==2 are given 10 times the weight of observations with green==1, and dropping one of the two such observations changes the tabulations significantly.

Code:

. * Example generated by -dataex-. For more info, type help dataex
. clear

. input float(green yellow wgt except)

         green     yellow        wgt     except
  1. 1 1  1 0
  2. 1 2  1 0
  3. 2 1 10 0
  4. 2 2 10 1
  5. end

. tab green yellow [aw=wgt]

           |        yellow
     green |         1          2 |     Total
-----------+----------------------+----------
         1 | .18181818  .18181818 | .36363636
         2 | 1.8181818  1.8181818 | 3.6363636
-----------+----------------------+----------
     Total |         2          2 |         4

. tab green yellow [aw=wgt] if except==0

           |        yellow
     green |         1          2 |     Total
-----------+----------------------+----------
         1 |       .25        .25 |        .5
         2 |       2.5          0 |       2.5
-----------+----------------------+----------
     Total |      2.75        .25 |         3

.

Last edited by William Lisowski; 26 Oct 2022, 09:49.

Comment

Rio Ferdinand

Join Date: Oct 2022

Posts: 10
#3

26 Oct 2022, 14:00

Thanks for your reply, William.

The issue of my question is that my household weight is actually specific to every country's survey. So in the survey of one country, each household has its weight which was created when this survey was done in one country. But now when I cross tabulate them, I guess if I understand correctly, it is not simply just rescaling.

As you can see, after adding the condition, the total of KHM decreases instead of increasing when I try to do the cross tabulation.

But if I do the tabulation for each country separately, this would not be a problem. Same for the cross tabulation without weight.

Therefore I wonder whether there is any way to fix this issue and I want to ask why it happens here.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

26 Oct 2022, 14:32

As you have coded your work, Stata does not "know" that the "household weight is actually specific to every country's survey", so it cannot rescale the weight separately for each country.

Your problem is not with cross tabulation - it is with combining differing surveys with different sets of weights. I suspect the commands described in the Stata Survey Data Reference Manual PDF (included with your Stata installation and accessible through Stata's Help menu) include tools that may help you, but I have no experience with this part of Stata.

You should review the introductory material in that PDF to get a sense of what this is all about. I envision that your countries correspond to "strata" in survey terms, but again, I'm no expert.

For assistance with the svy commands, you should start a new Statalist topic with a title something like "Problem combining surveys with distinct analytical weights" and include your two tabs of ISO by gender_area to illustrate your setup. (You don't need the tabs for a single country.) The purpose of this is to draw the attention of members with survey experience and expertise, who may not have chosen to read a question that appeared to be about cross-tabulation.

Good luck! I think you can solve your problem - it's just that combining surveys is more subtle than you realized.
1 like
Comment

Announcement