Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a boxplot with 2 x left axis to capture difference ranges #data visualisation

    Hi folks - been mulling over this but can't quite figure out a good solution. Wondered if anyone had suggestions.

    Basically, I have a boxplot looking at accelerometer counts and I want to show time spent at different intensities across PA spectrum. Problem is there is quite a large range at the lower end (in particular for 0-49cpm - see image below) - which makes it hard to make out the data at right end.
    Click image for larger version

Name:	time-intensity_60s_ALL_norm.png
Views:	1
Size:	68.7 KB
ID:	1539978



    One option is to remove the 0-49cpm category so can glean more insight from others (e.g. below). However, I do need to show all of them somehow (and ideally on same graph).
    Click image for larger version

Name:	time-intensity_60s_ALL_norm_drop0_49.png
Views:	1
Size:	75.4 KB
ID:	1539979




    I wondered whether a good option might be to have a second axis on the left for that categories, but superimposed onto the same graph for perspective (example of what I am pretty much after below)....
    Click image for larger version

Name:	Plot2.JPG
Views:	2
Size:	53.3 KB
ID:	1539981





    I wondered if anyone had any thoughts on how to do something like this in Stata? The above example is I think possible in excel... I'd also like to have the the graph in the same format by sex (as per above - bars side by side as opposed to two separate graphs) - but unsure how to do that.
    Click image for larger version

Name:	time-intensity_60s_ALL_SEX_norm_drop0_49.png
Views:	1
Size:	101.8 KB
ID:	1539980



    Wondered if anyone had any insights on how to 1) get sex side by side in same single plot, and 2) get something very similar to that excel version above with a separate axis for the 0-49cpm category)? Thanks and hope this is clear...

    This is my current Stata code.

    Code:
    graph box cpm_0_49-cpm_5000plus if incl_main==1 , ///
        nooutsides ytitle("Time (min/day)") note("") leg(off) graphregion(fc(white) ifc(white) lc(white) ilc(white)) ///
        showyvars yvar(label(labsize(tiny) angle(vertical))) yscale(range(0 1000)) ylabel(0 (250) 1000)
    
        graph export "$OUT_DATASET/Box_whisker\time-intensity_`epoch's_ALL_nonnorm.png", height(`plotexportheight') width(`plotexportwidth') replace
        
    graph box cpm_0_49-cpm_5000plus if incl_main==1 , ///
        over(sex) ///
        nooutsides ytitle("Time (min/day)") note("") leg(off) graphregion(fc(white) ifc(white) lc(white) ilc(white)) ///
        showyvars yvar(label(labsize(tiny) angle(vertical))) yscale(range(0 1000)) ylabel(0 (250) 1000)
    
        graph export "$OUT_DATASET/Box_whisker\time-intensity_`epoch's_ALL_SEX_nonnorm.png", height(`plotexportheight') width(`plotexportwidth') replace

    Attached Files

  • #2
    Logarithmic scale? But note https://www.stata.com/support/faqs/g...ithmic-scales/

    If you have any exact zeros, do flag that. Square roots are in any counts quite a natural scale for counts.

    if the binning on the x axis is part of any experimental design, so be it. Otherwise it seems to be that you have a problem in quantile regression.
    Last edited by Nick Cox; 06 Mar 2020, 07:35.

    Comment


    • #3
      Thanks Nick Cox. Yes there are a lot of zero's towards the right hand side, which makes logarithmic scale problematic. Never any negative values though. It is basically just a graph to try and describe the amount of time (min/day) a sample of 2000 people spend at certain intensities (from low to high - not many folk spent time at higher intensities)

      I had a look at the link you sent and had a try but I think I may have misinterpreted it (sorry, Stata beginner). Scales don't come out right.

      Should I have done something differently with the log10 (if at all) and how can one get those y scale labels to fit?

      I was also unsure how to get male/female bars paired side-by-side on the same graph (side question)?

      Code:
      foreach var of varlist cpm_0_49-cpm_5000plus  {
          clonevar log10_`var' = `var'
          replace log10_`var' = log10(`var')
      }
                  
      graph box log10_cpm_0_49-log10_cpm_5000plus if incl_main==1 , ///
          nooutsides ytitle("Time (min/day)") note("") leg(off) graphregion(fc(white) ifc(white) lc(white) ilc(white)) ///
          showyvars yvar(label(labsize(tiny) angle(vertical)))  ylabel(`labels', angle(h))
      Figure next:



      Last edited by patrick handcock; 06 Mar 2020, 08:37.

      Comment


      • #4
        The FAQ does explain that you need all positive values for logarithms to work. Taking logarithms in advance will mean that zeros will map to missings and so they would be ignored by the graph command any way.

        But your problem is different, I think.

        You are still asking for a vertical scale stretching to 1000, which means 10^1000 given your chosen units and that is a big number (impossible for you, because all the minutes in a day are only 1440).

        Further a lower limit of 0 on that scale implies a lower limit of 1 min/day but Stata will ignore an axis limit which implies omitting data.

        So if you have many or indeed any zeros you don't have a clear rationale for using log scale, but my suggestion of square roots remains.
        Last edited by Nick Cox; 06 Mar 2020, 08:36.

        Comment


        • #5
          Correct figure sorry [ATTACH=CONFIG]n1540012[/ATTACH]

          Comment


          • #6
            Thanks Nick for all your time. Apologies, yes you did warn me about zero's.

            I just had a try with sqrt:


            foreach var of varlist cpm_0_49-cpm_5000plus {
            clonevar sqrt_`var' = `var'
            replace sqrt_`var' = sqrt(`var')
            }

            graph box sqrt_cpm_0_49-sqrt_cpm_5000plus if incl_main==1 , ///
            nooutsides ytitle("Time (min/day)") note("") leg(off) graphregion(fc(white) ifc(white) lc(white) ilc(white)) ///
            showyvars yvar(label(labsize(tiny) angle(vertical))) ylabel(`labels', angle(h))
            Graph makes a bit more sense in terms of the relative values and avoids the zero's issue, but still unsure what is going on with the y-axis scale and whether that can be normalised to original values scaling in some way (i.e. min/day)?
            Click image for larger version

Name:	Graph.png
Views:	1
Size:	52.6 KB
ID:	1540015


            Comment


            • #7
              Think about in terms of what graph box knows. It knows only that you fed it variables with certain names. It has no sense of the history of those variables or of their being the square roots of what you care about.

              Also, you are insisting on using the same axis title as before, which no longer quite applies. You can do that but now although 0 still means 0, 10 20 30 40 mean on the original scale 100 400 900 1600 so you need for fix the axis labels accordingly.

              I wouldn't be very happy to use this graph myself. The x axis is quite busy and for some reason you've got a variable bin width with interval variously 50 and 100. Also, the colours are just an arbitrary mix. It's hard however to suggest a better design without knowing much more about the data generation.

              Comment


              • #8
                Sorry - my blunder on the scale aspect - of course they are different!

                Thanks. Yes, my thought was to shoot for more something like the below. I had also planned to reduce down to fewer variables in the x-axis to reduce busy-ness.

                The below is good as it keeps the original y-units scaling and original data, but it just requires a second y-axis to make it work. I also couldn't figure out how to pair the gender plots side-by-side, as is done below?


                Click image for larger version

Name:	Plot2.JPG
Views:	2
Size:	53.3 KB
ID:	1540020


                Fig. 1 Distribution of Movement Intensity in Men and Women (Median and IQR, normalized to 100-counts/minute width)
                Last edited by patrick handcock; 06 Mar 2020, 09:20.

                Comment

                Working...
                X