Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unexpected behavior when changing label size on box plot

    I have a dataset in which observations (animals) had a certain parameter (ct) estimated at different sampling points (dpi), according to the batch the observations belonged to. There are two subgroups within each batch, which I’m calling “treatment”. Different batches have different sampling points (and I think this is part of issue).

    Please find a mock data with similar structure to my real one below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(treatment batch animal dpi ct)
    1 1  1  0    17.943
    1 1  2  0 17.003647
    1 1  3  0 14.583344
    1 1  1  5 17.389624
    1 1  2  5  20.13154
    1 1  3  5  17.52407
    1 1  1 10 16.175755
    1 1  2 10 22.841276
    1 1  3 10  19.93141
    1 2  4  0  27.17235
    1 2  5  0 17.434746
    1 2  6  0  23.65517
    1 2  4  2 17.739742
    1 2  5  2 17.492239
    1 2  6  2  20.13189
    1 2  4  4  14.75321
    1 2  5  4  20.12855
    1 2  6  4 15.088642
    1 2  4  5 17.270346
    1 2  5  5 24.130836
    1 2  6  5 19.557253
    1 2  4  7 17.038406
    1 2  5  7  19.27315
    1 2  6  7 23.162224
    1 2  4 10 18.188578
    1 2  5 10  9.734836
    1 2  6 10  16.05163
    2 1  7  0 24.855415
    2 1  8  0   24.5916
    2 1  9  0 12.616377
    2 1  7  5 20.986664
    2 1  8  5 19.660107
    2 1  9  5  19.35184
    2 1  7 10   21.9416
    2 1  8 10 18.977673
    2 1  9 10 18.892675
    2 2 10  0 17.382215
    2 2 11  0 14.655262
    2 2 12  0 19.781366
    2 2 10  2  20.56149
    2 2 11  2   16.0062
    2 2 12  2 14.833175
    2 2 10  4 26.821455
    2 2 11  4  18.53457
    2 2 12  4  25.47236
    2 2 10  5 17.882187
    2 2 11  5 15.512028
    2 2 12  5 18.905464
    2 2 10  7  21.45293
    2 2 11  7 21.248846
    2 2 12  7 17.349068
    2 2 10 10 19.589876
    2 2 11 10 16.957424
    2 2 12 10 17.928701
    end
    When I ask Stata to create a box plot of the parameter I’m estimating (ct) *exclusive* of batch 1 over dpi by treatment, I get this plot:
    Code:
    graph box ct if batch==1, over(dpi) by(treatment)
    This plot is good. It recognizes that for batch==1, there are only “ct”s available for three sampling points (even though for batch 2, data is available for 6 datapoints - see tab dpi batch ). Just to make things prettier I’d like to change the font size of the x-axis value (please remember that this is a mock dataset, in my real dataset there are many more x-axis value). However, when I do add the labsize option on the dpi portion of the command, Stata suddenly plots not only the sampling points for which batch 1 has data, but all sampling for which any sampling point has data, which is not what I want.

    Code:
    graph box ct if batch==1, over(dpi, label(labsize(vsmall))) by(treatment)
    *note how the x-axis label size changed, but now not only the dpi 0, 5 and 10 are shown, rather, dpi's 0, 2, 4, 5, 7 and 10 are shown, even though I still have an if statement on the command. I'd like to diminish the font size of the x-axis label while only displaying the sampling points for which batch 1 has data.

    This is relatively easy to fix in my real data, as I can preserve, drop if batch != 1, run the command and restore the dataset after it. But is this behavior (adding more x-values because of label size change) expected? When I remove the "by", the problem goes away (see graph box ct if batch==1, over(dpi, label(labsize(vsmall))) but I'd like to have different graphs for each treatment group). What am I missing on this?

    I appreciate any help or input.

    Cheers

  • #2
    That does look like a bug to me.

    Comment


    • #3
      I'll leave this here in case it attracts attention of someone from Stata.

      Comment

      Working...
      X