I have a dataset in which observations (animals) had a certain parameter (ct) estimated at different sampling points (dpi), according to the batch the observations belonged to. There are two subgroups within each batch, which I’m calling “treatment”. Different batches have different sampling points (and I think this is part of issue).
Please find a mock data with similar structure to my real one below:
When I ask Stata to create a box plot of the parameter I’m estimating (ct) *exclusive* of batch 1 over dpi by treatment, I get this plot:
This plot is good. It recognizes that for batch==1, there are only “ct”s available for three sampling points (even though for batch 2, data is available for 6 datapoints - see tab dpi batch ). Just to make things prettier I’d like to change the font size of the x-axis value (please remember that this is a mock dataset, in my real dataset there are many more x-axis value). However, when I do add the labsize option on the dpi portion of the command, Stata suddenly plots not only the sampling points for which batch 1 has data, but all sampling for which any sampling point has data, which is not what I want.
*note how the x-axis label size changed, but now not only the dpi 0, 5 and 10 are shown, rather, dpi's 0, 2, 4, 5, 7 and 10 are shown, even though I still have an if statement on the command. I'd like to diminish the font size of the x-axis label while only displaying the sampling points for which batch 1 has data.
This is relatively easy to fix in my real data, as I can preserve, drop if batch != 1, run the command and restore the dataset after it. But is this behavior (adding more x-values because of label size change) expected? When I remove the "by", the problem goes away (see graph box ct if batch==1, over(dpi, label(labsize(vsmall))) but I'd like to have different graphs for each treatment group). What am I missing on this?
I appreciate any help or input.
Cheers
Please find a mock data with similar structure to my real one below:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(treatment batch animal dpi ct) 1 1 1 0 17.943 1 1 2 0 17.003647 1 1 3 0 14.583344 1 1 1 5 17.389624 1 1 2 5 20.13154 1 1 3 5 17.52407 1 1 1 10 16.175755 1 1 2 10 22.841276 1 1 3 10 19.93141 1 2 4 0 27.17235 1 2 5 0 17.434746 1 2 6 0 23.65517 1 2 4 2 17.739742 1 2 5 2 17.492239 1 2 6 2 20.13189 1 2 4 4 14.75321 1 2 5 4 20.12855 1 2 6 4 15.088642 1 2 4 5 17.270346 1 2 5 5 24.130836 1 2 6 5 19.557253 1 2 4 7 17.038406 1 2 5 7 19.27315 1 2 6 7 23.162224 1 2 4 10 18.188578 1 2 5 10 9.734836 1 2 6 10 16.05163 2 1 7 0 24.855415 2 1 8 0 24.5916 2 1 9 0 12.616377 2 1 7 5 20.986664 2 1 8 5 19.660107 2 1 9 5 19.35184 2 1 7 10 21.9416 2 1 8 10 18.977673 2 1 9 10 18.892675 2 2 10 0 17.382215 2 2 11 0 14.655262 2 2 12 0 19.781366 2 2 10 2 20.56149 2 2 11 2 16.0062 2 2 12 2 14.833175 2 2 10 4 26.821455 2 2 11 4 18.53457 2 2 12 4 25.47236 2 2 10 5 17.882187 2 2 11 5 15.512028 2 2 12 5 18.905464 2 2 10 7 21.45293 2 2 11 7 21.248846 2 2 12 7 17.349068 2 2 10 10 19.589876 2 2 11 10 16.957424 2 2 12 10 17.928701 end
Code:
graph box ct if batch==1, over(dpi) by(treatment)
Code:
graph box ct if batch==1, over(dpi, label(labsize(vsmall))) by(treatment)
This is relatively easy to fix in my real data, as I can preserve, drop if batch != 1, run the command and restore the dataset after it. But is this behavior (adding more x-values because of label size change) expected? When I remove the "by", the problem goes away (see graph box ct if batch==1, over(dpi, label(labsize(vsmall))) but I'd like to have different graphs for each treatment group). What am I missing on this?
I appreciate any help or input.
Cheers
Comment