Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a group variable (percentage of women with a given characteristic) for each country and year in a panel survey

    Dear members of the list,

    I have a panel with observations from several countries over several years.

    At the individual level, I have information on the individual's supervisory role ("Are you supervising someone": yes/no) and gender.

    I want to create an aggregate-level variable for each country and wave that tells me the percentage of individuals with a supervisory role who are women for this country and year; in other words, the female representation among those with supervisory power... And repeat the operation for all the countries and years in the sample, obviously

    For instance, for a given country in the sample (cntry==724) the gender distribution among those with/without supervisory roles in 1999 (S020==1999) is as it appears in the following table

    HTML Code:
    . tab supervising female if cntry==724 & S020==1999, row
    
    +----------------+
    | Key            |
    |----------------|
    |   frequency    |
    | row percentage |
    +----------------+
    
    Are you supervising |          Sex
                someone |      Male     Female |     Total
    --------------------+----------------------+----------
                     No |       211        131 |       342 
                        |     61.70      38.30 |    100.00 
    --------------------+----------------------+----------
                    Yes |        70         19 |        89 
                        |     78.65      21.35 |    100.00 
    --------------------+----------------------+----------
                  Total |       281        150 |       431 
                        |     65.20      34.80 |    100.00 
    
    . 

    I am interested in the row percentages among the ones who answered yes to the question "Are you supervising someone"; and, more especifically, in the percentage that appear marked in red in the following image, which is a reproduction of the table above. This percentage says that 21.35% of those who appeared as claiming a supervisory role in country 724 are women.

    Click image for larger version

Name:	For Statalist.png
Views:	1
Size:	5.6 KB
ID:	1752040



    I have followed the tips provided in the following link ("How can I create variables containing percent summaries?") which is very close to what I want. I have used the command 'egen' with 'mean' (see line below). But I have not succeded in arriving to what I want; that is, to a figure like the 21.35 for each country and wave. For instance, I have tried with the following command line (where 'cntry' stands for country, S020 stands for 'wave', 'female' for gender and 'supervising' for having a supervisory role (1) or not (0)), but to no avail:

    HTML Code:
    bysort cntry S020 female: egen pc_superv_fem=mean(100*inlist(supervising, 1)) if supervising<.
    Unfortunately, the order above does not produce the row percentages that appear in the figure in blue above (and obviously not the percentage in red)

    Could you, please, help me with that?

    Many thanks for your attention

    Luis Ortiz



  • #2
    This may help. You don't give a data example using dataex (FAQ Advice #12) but one can be constructed with some work from your output.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(ctry S020 female supervising freq)
    74 1999 0 0 211
    74 1999 1 0 131
    74 1999 0 1  70
    74 1999 1 1  19
    end
    label values female female
    label def female 0 "male", modify
    label def female 1 "female", modify
    label values supervising supervising
    label def supervising 0 "no", modify
    label def supervising 1 "yes", modify
    
    expand freq 
    
    bysort ctry S020 : egen wanted = mean(100 * cond(supervising == 1, female, .)) 
    
    tabdisp ctry S020, c(wanted)
    
    --------------------
              |   S020  
         ctry |     1999
    ----------+---------
           74 | 21.34831
    --------------------
    The main trick here is to ensure that you get conditional means for female. Putting female in the by: list goes against that as you're segregating males and females as well as countries and years. If you condition on supervising == 1 you're automatically ignoring values of 0 and missing (and anything else).

    To get the mean as a percentage you need to multiply the mean as a proportion by 100 somewhere. 100 * mean() would make sense mathematically and statistically but isn't supported by egen.

    Code:
    bysort ctry S020 : egen wanted = mean(100 * female) if supervising == 1
    doesn't have quite the same effect. It produces the same numbers but not in all observations.

    More at https://journals.sagepub.com/doi/pdf...867X1101100210 -- especially Section 9.

    Comment


    • #3
      Many thanks for your answer, Nick. It's much appreciated.

      Your answer is very enlightening. I see the problem in putting 'female' in the by. I also understand the need of conditioning on supervising==1

      I have tried to reproduce your code with my data. I just called 'wanted' differently (pc_superv_fem)

      HTML Code:
      bysort cntry S020 : egen pc_superv_fem = mean(100 * cond(supervising == 1, female, .)) 
      But then, when I check what happens for the country and year mentioned in my initial message, I got a figure that does not correspond to the 21.35 that I was expecting

      HTML Code:
      . tab pc_superv_fem if cntry==724 & S020==1999
      
      pc_superv_f |
               em |      Freq.     Percent        Cum.
      ------------+-----------------------------------
         121.3483 |      1,200      100.00      100.00
      ------------+-----------------------------------
            Total |      1,200      100.00
      I have tried to use dataex to generate a workable sample of my data, as you indicated (my apologies for not having done that initially), but I do not believe that I've been very successful. I tried this....

      HTML Code:
      dataex female supervising S020 if cntry==724 & (S020==1999 | S020==2008) & supervising<.
      ...and I have got this. There does not seem to be sufficient variation in the variables to use it as a good sample; am I wrong?

      . dataex female supervising S020 if cntry==724 & (S020==1999 | S020==2008) & supervising<.

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(female supervising) int S020
      2 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      1 0 1999
      end
      label values female X001
      label def X001 1 "Male", modify
      label def X001 2 "Female", modify
      label values supervising X031
      label def X031 0 "No", modify
      label values S020 S020
      label def S020 1999 "       1999", modify
      I also tried using randomtag, considering that my dataset is large (many countries and years), but I have not managed to provide the sample of my data that I would have liked to provide here.

      My apologies for that

      And many thanks for your attention again

      Luis Ortiz

      Comment


      • #4
        Your female variable is coded 1 and 2. Mine was coded 0 and 1. So, your mean as a percentage is too large by 100, just as the mean of 1 = 100% and 2 = 200% will be 150%, not 50%. A quick fix is to subtract 1 on the fly.

        Code:
         
         bysort ctry S020 : egen wanted = mean(100 * cond(supervising == 1, female - 1, .))
        (0, 1) indicator variables are, for almost all purposes, superior to any other flavour. The point is made in many places: one such is https://www.stata-journal.com/articl...article=dm0099

        Comment


        • #5
          Oh, my God....¡¡ What a silly thing.

          Certainly, you were right (image below)

          Click image for larger version

Name:	Sin título.png
Views:	1
Size:	7.5 KB
ID:	1752159


          Many thanks for your patience and assistance, Nick

          All the best

          Luis Ortiz

          Comment

          Working...
          X