Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to add confidence intervals to a plot with two different variables over gender?

    Dear users,

    After a long time looking for a way to do it, I come to you as my last hope. I have a dataset in which students pick other students for their teams. There are two variables of interest: the gender of the individual picking and the gender of the individual pick. I want to plot in a graphic the probability of being selected depending on those two variables (so, the probability that a boy picks a boy and a girl, and the probability a girl picks a boy and a girl, 4 bars in total). I already achieved being able to count how many boys and grils each student picks, and therefore the probability of being picken. This is the graph I get:

    Code:
    graph bar maleprobmean femaleprobmean, over(female)
    Click image for larger version

Name:	Gender_selection.png
Views:	1
Size:	40.6 KB
ID:	1739994


    Now I want to add the confidence intervals to that graph. I have already computed them with collapse, in the following way:

    Code:
    collapse (mean) maleprobmean = maleprob_team femaleprobmean = femaleprob_team (sd) sdmaleprob = maleprob_team sdfemaleprob = femaleprob_team (count) nmale = maleprob_team nfemale = femaleprob_team, by(female)
    
    generate himale = maleprobmean + invttail(nmale-1,0.025)*(sdmaleprob / sqrt(nmale))
    generate lomale = maleprobmean - invttail(nmale-1,0.025)*(sdmaleprob / sqrt(nmale))
    
    generate hifemale = femaleprobmean + invttail(nfemale-1,0.025)*(sdfemaleprob / sqrt(nfemale))
    generate lofemale = femaleprobmean - invttail(nfemale-1,0.025)*(sdfemaleprob / sqrt(nfemale))
    And this is how the data looks like:
    female maleprobmean femaleprobmean sdmaleprob sdfemaleprob nmale nfemale himale lomale hifemale lofemale
    Male .2157536 .0935449 .1723743 .129409 854 854 .2273309 .2041763 .1022365 .0848533
    Female .0963259 .2172072 .1362991 .1678487 873 881 .1053798 .087272 .228306 .2061084

    The only problem now, is puttin those two things together. I have tried using the graph twoway command, but the result it gives is very weird, since I have not any additional variable to group them, so the two CI and the two bars for each variable of each gender stay on top of each other. I cannot for the sake of me figure out a way to add it, as simple as it may be.
    If someone has some idea of how to do it, I would be extremely thankful.

  • #2
    The good news is that plotting confidence intervals is often discussed here and indeed elsewhere (e.g. in the Stata Journal).

    Let's back up -- and in the absence of a data example -- use some invented data. I start with the idea that

    Code:
     There are two variables of interest: the gender of the individual picking and the gender of the individual pick. I want to plot in a graphic the probability of being selected depending on those two variables
    I got lost soon after that on what you have done and how you are thinking about this, so I will stick to a simple formulation. If I am misunderstanding your set-up, I am still pointing to what should be relevant commands. You don't need to calculate the confidence intervals yourself.

    I see those variables as defining pr(boy picks girl) and pr(girl picks girl) and there are two complementary probabilities, pr(boy picks boy) and pr(girl picks boy), which are as said just complementary. So two means and two confidence intervals summarize the data. I use ideas similar to those in https://journals.sagepub.com/doi/pdf...867X1001000112

    Code:
    clear 
    set obs 100
    set seed 2803 
    gen picker = runiformint(0, 1)
    gen picked = runiformint(0, 1)
    label def female 0 male 1 female 
    label val picker female 
    label val picked female 
    
    statsby, clear by(picker) : ci proportion picked, jeffreys 
    
    scatter mean picker, ms(Dh) msize(large) || rcap ub lb picker , xla(0 1, valuelabel noticks) ///
    xsc(r(-0.2 1.2)) aspect(1) ytitle(proportion picking female) legend(off)
    subtitle(means and 95% confidence intervals: Jeffreys method)
    Click image for larger version

Name:	pickerpicked.png
Views:	1
Size:	30.3 KB
ID:	1740004


    Naturally there are many variations on this idea. The Jeffreys method just happens to be a personal favourite.

    The use of bar charts seems unnecessary here. The point is comparisons of probabilities with each other and say with 0.5, not with zero. See also any thread on the internet against dynamite, detonator or plunger plots.

    Comment


    • #3
      Hi Nick,

      Thank you very much for your idea. I know it had to be simple enough, but for some reason I was focused on bar charts and couldn't get out of it. The point plot looks indeed clearer and makes adding the CI much easier.

      Comment

      Working...
      X