
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Plotting adjusted means as a bar graph


    I am trying to plot the adjusted means (in the form of a bar graph as both predictors are categorical) from a linear regression model.

    Say I run the following:

    regress y i.a##i.b c.covariable c.covariable i.covariable

    The interaction is significant, so therefore I want to plot the adjusted means for each level of a (2-levels) and b (2 levels).

    To see the adjusted means, I do:
    margins a#b

    And to get the individual predicted values I do:

    predict predicted_y

    I figured the best way to plot bar graphs was using the following code:

    cibar predicted_y, over1(a) over2(b)

    However, the means on the bar graphs do not match up to the values obtained from margins a#b

    Any ideas on why this might be? Or perhaps another way I can directly plot the outcome from the interaction in a bar graph?

    Many thanks,

  • #2
    After the margins command:

    marginsplot, recast(bar) by(b)


    • #3
      Thanks! but i can't seem to edit the colours based on (a). The levels of b are in two separate panels, which is great - but is it possible to change the colours for each level of a?


      • #4
        It will be difficult to do this with marginsplot. But there are several posts here on differentiating bars or plots in other contexts. See, e.g., and If it was my problem, I would install coefplot from SSC. Here is an example (you can change the axis labels, naturally):

        webuse lbw, clear
        regress bwt age lwt i.smoke#i.ui i.race
        margins i.smoke#i.ui, post
        estimates store marg
        coefplot (marg, recast(bar) keep(0.smoke#0.ui 0.smoke#1.ui) bcolor(red%50)) ///
                 (marg, recast(bar) keep(1.smoke#0.ui 1.smoke#1.ui) bcolor(blue%50)), ///
                 nokey scheme(s1mono) xsc(r(0,.)) xlab(0(1000)3500)
        One extra detail is that bars should start at zero, and coefplot truncates part of it. Some referees may be very particular on this point, so specify the axis labels such that the bars start at zero (highlighted).

        Click image for larger version

Name:	Graph.png
Views:	1
Size:	33.5 KB
ID:	1644760

        Last edited by Andrew Musau; 12 Jan 2022, 06:29.


        • #5
          A bar chart has no obvious advantage here.

          In Andrew Musau's helpful example, the bars are truncated at 2000, and indeed there is no obvious point in starting the bars at zero as the comparison is of birthweights, with each other, not with zero. Yet bar length (height) encodes (birthweight MINUS 2000), which I presume is arbitrary and here a side-effect of the nice axis labels and the fact that the underlying command won't force bars to start at zero.

          We have no way of commenting on @Lucy Hiscox's values as y is thoroughly anonymous and there is no data example.

          I suggest as principles:

          Bars starting at zero make sense if and only if comparisons with zero are germane.

          Otherwise bars starting anywhere else need a rationale for "anywhere else".

          Otherwise the most direct and informative plot represents coefficient estimates by point markers and uncertainty by capped or uncapped spikes.


          • #6
            Hi both, many thanks for your replies.

            To add context - here is my data.

            Perhaps it would be best to present coefficient estimates by point markers, as you suggest, with: marginsplot, x(ipv_any_recent) ?

            PHP Code:
             margins child_sex#ipv_any_recent, post

            Predictive margins                                         Number of obs 143
            Model VCE

            Linear predictionpredict()

            |     Margin   stderr.      t    P>|t|     [95confinterval]
            child_sex#ipv_any_recent |
            Male#Below threshold  |   1078.921    6.52852   165.26   0.000     1066.006    1091.836
            Male#Above threshold  |   1092.304   7.588248   143.95   0.000     1077.293    1107.316
            Female#Below threshold  |   1083.772   6.838856   158.47   0.000     1070.243    1097.301
            Female#Above threshold  |   1066.813   7.471749   142.78   0.000     1052.032    1081.594
            marginsplot, recast(bar) by(ipv_any_recent)
            Attached Files


            • #7

              margins child_sex#ipv_any_recent, post
              marginsplot, recast(scatter) by(ipv_any_recent) xsc(r(-0.5 1.5))


              • #8
                Fortunately, or fortuitously, the results make my point. If I understand correctly the t tests indicate massively that the estimated quantities are not zero, which seems unlikely to be scientific news. (I still don't know what the outcome variable is.)

                At the same time the intervals all overlap....


                • #9
                  Originally posted by Nick Cox View Post
                  Fortunately, or fortuitously, the results make my point. If I understand correctly the t tests indicate massively that the estimated quantities are not zero, which seems unlikely to be scientific news. (I still don't know what the outcome variable is.)

                  At the same time the intervals all overlap....
                  Hi Nick,

                  The outcome is brain volume.

                  Here is the main omnibus test of the interaction:

                  PHP Code:
                    regress amygdala_corrected ib0.ipv_any_recent##ib0.child_sex i.maternal_hiv i.bdi_total_threshold c.birthweight i.prenatal_cotin
                  ine i.prenatal_alcohol_composite c.child_age 

                  |       SS           df       MS      Number of obs   =       140
                  -------------+----------------------------------   F(11128)      =      3.44
                  |  47257.5452        11  4296.14047   Prob F        =    0.0003
                  |  159882.635       128  1249.08308   R-squared       =    0.2281
                  -------------+----------------------------------   Adj R-squared   =    0.1618
                  |   207140.18       139  1490.21712   Root MSE        =    35.342

                  amygdala_corrected Coefficient  Stderr.      t    P>|t|     [95confinterval]
                  ipv_any_recent |
                  Above threshold  |   11.55974   8.659454     1.33   0.184    -5.574472    28.69394
                  child_sex |
                  Female  |   6.939471   8.343113     0.83   0.407    -9.568803    23.44775
                  ipv_any_recent#child_sex |
                  Above threshold#Female  |  -27.22183    12.2964    -2.21   0.029    -51.55235   -2.891306
                  clinic |
                  TC Newman  |  -9.942632   7.795401    -1.28   0.204    -25.36716    5.481902
                  maternal_hiv |
                  Positive  |   .4519586   8.840126     0.05   0.959    -17.03974    17.94366
                  bdi_total_threshold |
                  Above threshold  |  -3.721508   6.851118    -0.54   0.588    -17.27762      9.8346
                  |    -.02854   .0067888    -4.20   0.000    -.0419729   -.0151072
                  prenatal_cotinine |
                  Passive smoker  |   2.076213   7.555715     0.27   0.784    -12.87406    17.02649
                              Active smoker  
                  |   4.749271   8.759693     0.54   0.589    -12.58328    22.08182
                  prenatal_alcohol_composite |
                  Exposure  |    11.6201   8.300294     1.40   0.164    -4.803455    28.04365
                  |  -9.187756   3.754138    -2.45   0.016    -16.61596   -1.759554
                  |   1201.906   26.28029    45.73   0.000     1149.906    1253.906
                  So we aren't looking at the t tests or how they differ from zero.

                  In sex-stratified results, I get:

                  PHP Code:
                   bysort child_sexregress amygdala_corrected ib0.ipv_any_recent i.maternal_hiv i.bdi_total_threshold c.birthweight i.prenatal_co
                  tinine i.prenatal_alcohol_composite c.child_age

                  child_sex Male

                  |       SS           df       MS      Number of obs   =        71
                  -------------+----------------------------------   F(961)        =      3.01
                  |  32976.7139         9  3664.07932   Prob F        =    0.0049
                  |  74229.1089        61  1216.87064   R-squared       =    0.3076
                  -------------+----------------------------------   Adj R-squared   =    0.2054
                  |  107205.823        70  1531.51175   Root MSE        =    34.884

                  amygdala_corrected Coefficient  Stderr.      t    P>|t|     [95confinterval]
                  ipv_any_recent |
                  Above threshold  |   12.17911   8.742984     1.39   0.169    -5.303563    29.66179
                  clinic |
                  TC Newman  |  -15.20955   11.19939    -1.36   0.179    -37.60413     7.18502
                  maternal_hiv |
                  Positive  |    5.48999   12.46346     0.44   0.661    -19.43224    30.41222
                  bdi_total_threshold |
                  Above threshold  |  -.8874126   10.31498    -0.09   0.932    -21.51349    19.73866
                  |  -.0306118   .0092658    -3.30   0.002    -.0491399   -.0120837
                  prenatal_cotinine |
                  Passive smoker  |  -2.936045   10.62852    -0.28   0.783    -24.18909      18.317
                              Active smoker  
                  |  -3.984423   12.25012    -0.33   0.746    -28.48005     20.5112
                  prenatal_alcohol_composite |
                  Exposure  |   14.01245   11.68538     1.20   0.235    -9.353903    37.37881
                  |   -12.5523   6.162457    -2.04   0.046     -24.8749   -.2297055
                  |   1223.427   37.29894    32.80   0.000     1148.843    1298.011

                  child_sex Female

                  |       SS           df       MS      Number of obs   =        69
                  -------------+----------------------------------   F(959)        =      1.59
                  |  19053.9529         9  2117.10587   Prob F        =    0.1388
                  |  78474.2361        59   1330.0718   R-squared       =    0.1954
                  -------------+----------------------------------   Adj R-squared   =    0.0726
                  |   97528.189        68  1434.23807   Root MSE        =     36.47

                  amygdala_corrected Coefficient  Stderr.      t    P>|t|     [95confinterval]
                  ipv_any_recent |
                  Above threshold  |  -20.29075   9.759249    -2.08   0.042    -39.81896   -.7625396
                  clinic |
                  TC Newman  |   -.550149   11.95036    -0.05   0.963    -24.46277    23.36247
                  maternal_hiv |
                  Positive  |  -7.854308   13.70646    -0.57   0.569    -35.28087    19.57226
                  bdi_total_threshold |
                  Above threshold  |  -6.252422   10.14643    -0.62   0.540    -26.55539    14.05055
                  |  -.0247939    .010742    -2.31   0.025    -.0462885   -.0032992
                  prenatal_cotinine |
                  Passive smoker  |    10.4688   11.39933     0.92   0.362    -12.34119     33.2788
                              Active smoker  
                  |    9.10838   13.55255     0.67   0.504    -18.01021    36.22697
                  prenatal_alcohol_composite |
                  Exposure  |   12.35853    12.5019     0.99   0.327    -12.65772    37.37478
                  |  -5.739353   5.094657    -1.13   0.264    -15.93374    4.455033
                  |   1180.553   38.49096    30.67   0.000     1103.532    1257.573
                  which suggests IPV effects females only. Hence the need to want to show adjusted means in the form of a bar graph.

                  Does that sound ok?


                  • #10
                    I think discussing these results requires more medical expertise than I have, which is almost none. I am left with a feeling that you have rather a small dataset for this complicated a model, but I doubt that is news to you, and it's a cheap comment when often getting more data that remains comparable is often far easier said than done.

