Generating a group variable (percentage of women with a given characteristic) for each country and year in a panel survey

Luis Ortiz

Join Date: Dec 2014

Posts: 97
#1

Generating a group variable (percentage of women with a given characteristic) for each country and year in a panel survey

02 May 2024, 09:16

Dear members of the list,

I have a panel with observations from several countries over several years.

At the individual level, I have information on the individual's supervisory role ("Are you supervising someone": yes/no) and gender.

I want to create an aggregate-level variable for each country and wave that tells me the percentage of individuals with a supervisory role who are women for this country and year; in other words, the female representation among those with supervisory power... And repeat the operation for all the countries and years in the sample, obviously

For instance, for a given country in the sample (cntry==724) the gender distribution among those with/without supervisory roles in 1999 (S020==1999) is as it appears in the following table

HTML Code:

. tab supervising female if cntry==724 & S020==1999, row +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ Are you supervising | Sex someone | Male Female | Total --------------------+----------------------+---------- No | 211 131 | 342 | 61.70 38.30 | 100.00 --------------------+----------------------+---------- Yes | 70 19 | 89 | 78.65 21.35 | 100.00 --------------------+----------------------+---------- Total | 281 150 | 431 | 65.20 34.80 | 100.00 .

I am interested in the row percentages among the ones who answered yes to the question "Are you supervising someone"; and, more especifically, in the percentage that appear marked in red in the following image, which is a reproduction of the table above. This percentage says that 21.35% of those who appeared as claiming a supervisory role in country 724 are women.

I have followed the tips provided in the following link ("How can I create variables containing percent summaries?") which is very close to what I want. I have used the command 'egen' with 'mean' (see line below). But I have not succeded in arriving to what I want; that is, to a figure like the 21.35 for each country and wave. For instance, I have tried with the following command line (where 'cntry' stands for country, S020 stands for 'wave', 'female' for gender and 'supervising' for having a supervisory role (1) or not (0)), but to no avail:

HTML Code:

bysort cntry S020 female: egen pc_superv_fem=mean(100*inlist(supervising, 1)) if supervising<.

Unfortunately, the order above does not produce the row percentages that appear in the figure in blue above (and obviously not the percentage in red)

Could you, please, help me with that?

Many thanks for your attention

Luis Ortiz
Tags: egen, panel data, percent summaries, percent variables
Nick Cox

Join Date: Mar 2014

Posts: 35445
#2

02 May 2024, 09:51

This may help. You don't give a data example using dataex (FAQ Advice #12) but one can be constructed with some work from your output.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(ctry S020 female supervising freq) 74 1999 0 0 211 74 1999 1 0 131 74 1999 0 1 70 74 1999 1 1 19 end label values female female label def female 0 "male", modify label def female 1 "female", modify label values supervising supervising label def supervising 0 "no", modify label def supervising 1 "yes", modify expand freq bysort ctry S020 : egen wanted = mean(100 * cond(supervising == 1, female, .)) tabdisp ctry S020, c(wanted) -------------------- | S020 ctry | 1999 ----------+--------- 74 | 21.34831 --------------------

The main trick here is to ensure that you get conditional means for female. Putting female in the by: list goes against that as you're segregating males and females as well as countries and years. If you condition on supervising == 1 you're automatically ignoring values of 0 and missing (and anything else).

To get the mean as a percentage you need to multiply the mean as a proportion by 100 somewhere. 100 * mean() would make sense mathematically and statistically but isn't supported by egen.

Code:

bysort ctry S020 : egen wanted = mean(100 * female) if supervising == 1

doesn't have quite the same effect. It produces the same numbers but not in all observations.

More at https://journals.sagepub.com/doi/pdf...867X1101100210 -- especially Section 9.
1 like
Comment

Luis Ortiz

Join Date: Dec 2014
Posts: 97

03 May 2024, 03:39

Many thanks for your answer, Nick. It's much appreciated.

Your answer is very enlightening. I see the problem in putting 'female' in the by. I also understand the need of conditioning on supervising==1

I have tried to reproduce your code with my data. I just called 'wanted' differently (pc_superv_fem)

HTML Code:

bysort cntry S020 : egen pc_superv_fem = mean(100 * cond(supervising == 1, female, .))

But then, when I check what happens for the country and year mentioned in my initial message, I got a figure that does not correspond to the 21.35 that I was expecting

HTML Code:

. tab pc_superv_fem if cntry==724 & S020==1999

pc_superv_f |
         em |      Freq.     Percent        Cum.
------------+-----------------------------------
   121.3483 |      1,200      100.00      100.00
------------+-----------------------------------
      Total |      1,200      100.00

I have tried to use dataex to generate a workable sample of my data, as you indicated (my apologies for not having done that initially), but I do not believe that I've been very successful. I tried this....

HTML Code:

dataex female supervising S020 if cntry==724 & (S020==1999 | S020==2008) & supervising<.

...and I have got this. There does not seem to be sufficient variation in the variables to use it as a good sample; am I wrong?

. dataex female supervising S020 if cntry==724 & (S020==1999 | S020==2008) & supervising<.

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(female supervising) int S020
2 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
1 0 1999
end
label values female X001
label def X001 1 "Male", modify
label def X001 2 "Female", modify
label values supervising X031
label def X031 0 "No", modify
label values S020 S020
label def S020 1999 "       1999", modify

I also tried using randomtag, considering that my dataset is large (many countries and years), but I have not managed to provide the sample of my data that I would have liked to provide here.

My apologies for that

And many thanks for your attention again

Luis Ortiz

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35445
#4

03 May 2024, 05:15

Your female variable is coded 1 and 2. Mine was coded 0 and 1. So, your mean as a percentage is too large by 100, just as the mean of 1 = 100% and 2 = 200% will be 150%, not 50%. A quick fix is to subtract 1 on the fly.

Code:

bysort ctry S020 : egen wanted = mean(100 * cond(supervising == 1, female - 1, .))

(0, 1) indicator variables are, for almost all purposes, superior to any other flavour. The point is made in many places: one such is https://www.stata-journal.com/articl...article=dm0099
1 like
Comment
Luis Ortiz

Join Date: Dec 2014

Posts: 97
#5

03 May 2024, 05:50

Oh, my God....¡¡ What a silly thing.

Certainly, you were right (image below)

Many thanks for your patience and assistance, Nick

All the best

Luis Ortiz
Comment

Announcement