Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stripplots

    Second test post:

    Hi

    I am trying to move past traditional box and whisker plots given their limitations, and instead want to visualise my data using a stripplot. I am using Stata 16.1 and am very much a novice (although I have tried my best to work through the stripplot guidance).

    Background

    In terms of context, I have asked my participants to play a game. They are split into either condition 0 or condition 1. In each game, they can choose either to cooperate (responsedummy=1) or not to cooperate (responsedummy=0). I have also collected information about various personality traits which are each reported on a scale of 0-5 (extracted below in the dataex as an example are openness and agreeableness, but I have two further traits to analyse). I am trying to understand how specific levels of traits affect cooperation.

    What I have tried so far

    I started with a traditional plot using the following (I appreciate it is messy and has many pitfalls, and I could cut the visualise the data in many ways, hence my desire to investigate stripplots):

    graph box openness agreeableness extraversion honestyhumility, over(condition) over(responsedummy) ylabel(, angle(horizontal))
    Click image for larger version

Name:	Picture1.png
Views:	1
Size:	183.3 KB
ID:	1646347





    But I am now exploring stripplots. I appreciate that the over() command can only be used with a single variable. I am comfortable producing a focused analysis e.g. this for openness:

    stripplot openness, over(responsedummy) box(barw(0.8) blcolor(ltblue)) jitter(3) centre vertical cumul cumpr mc(orange) scheme(s1color) yla(, ang(h))
    I have also been exploring options such as

    stripplot openness, over(responsedummy) box vertical stack h(0.4)
    stripplot openness, over(responsedummy) box blcolor(white) iqr jitter(3) stack h(0.5) vertical mcolor(blue)
    Click image for larger version

Name:	Picture2.png
Views:	2
Size:	235.9 KB
ID:	1646349



    What I am hoping for some guidance on

    I am trying to understand how personality traits affect cooperation. I will be running a probit model on the data set so what I am trying to do here is just give the reader as accurate a visual as possible in terms of the descriptive statistics.

    I can't quite seem to get the hang of the by() command mentioned in posts such as this one.

    Ideally I would like to create a stripplot which shows each personality trait side by side (like I have done with the box plot) for one condition, and then broken down by non-cooperation/cooperation. I can then replicate for the other condition.

    I'd like to then be able to have another stripplot showing cooperation and all personality traits, and then non-cooperation and personality traits.

    Once I get the basics, I'm hoping i'll be able to run with it! Many thanks in advance for any help.

    Dataex

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id float(openness agreeableness condition responsedummy)
    5460439 2.5 3.3 0 1
    5460448 2.6 2.6 1 1
    5460477 2.6   2 0 1
    5460525 3.5 2.4 0 1
    5460546 3.3 3.2 1 1
    5460586 4.6 4.2 1 1
    5460576 3.5   3 1 1
    5460556 3.9 2.9 0 1
    5460553 2.5 4.1 0 1
    5460558 2.7 2.5 1 1
    5460550 4.3   3 0 0
    5460557 3.5 2.8 1 0
    5460540 3.9 2.8 0 0
    5460521 3.9 2.3 1 0
    5460541 4.4 3.1 0 0
    5460520 3.7 2.4 0 0
    5460522 2.8 3.1 0 1
    5460517 4.6 4.1 1 0
    5460524 2.8 2.5 1 1
    5460518 2.6 3.1 1 0
    5460624 2.5 3.4 1 0
    5460627   4 3.4 0 0
    5460516 3.9 2.3 1 0
    5460505 3.8 3.3 0 0
    5460503 2.9 3.7 1 0
    5460506 4.4 3.4 0 1
    5460502 2.7 2.5 1 1
    5460491 4.2 3.6 1 0
    5460658 2.7   4 1 1
    5460495 4.6 3.9 1 0
    5460484   4 4.2 0 1
    5460482 3.4 2.9 0 0
    5460485 3.1 3.7 1 1
    5460483 3.2 3.3 0 1
    5460471 3.3 2.9 1 1
    5460469 3.3 2.8 0 0
    5460459   3 3.8 0 1
    5460449 3.6 3.5 1 1
    5460442 3.2 2.9 1 1
    5460691 4.4 2.9 0 1
    5460438   3 2.7 1 0
    5460656 4.6 3.9 0 0
    5460435 3.9 3.9 0 1
    5460715 3.5 3.6 1 0
    5460433 3.8 2.6 0 1
    5460709 2.6 4.6 0 1
    5460431 3.5 3.9 1 0
    5460589 3.7 2.8 1 0
    5460569 3.5 3.5 0 1
    5460600 3.8 3.5 1 1
    5460580 4.1 3.2 0 0
    5460590 3.7 2.3 1 1
    5460551 3.1 3.3 1 0
    5460598 4.2 4.5 1 1
    5460593 4.3 3.4 1 0
    5460584 3.3 2.8 0 1
    5460578 4.2 3.3 0 1
    5460564 3.2 1.6 1 0
    5460574 3.9 2.9 0 0
    5460571 3.9 2.6 1 1
    5460555 4.1 3.6 1 0
    5460563 4.5 2.9 0 1
    5460607   4 3.8 0 0
    5460605   3 2.8 0 0
    5460601 2.9 2.4 0 1
    5460616 3.6 3.8 0 1
    5460623 3.4 3.2 0 1
    5460625 3.8 3.3 0 1
    5460615 4.5 3.2 0 1
    5460642 3.5 3.2 1 0
    5460626 4.1 2.5 1 1
    5460695   3 3.5 1 0
    5460721 4.3 4.3 1 1
    5460717 4.2 2.8 1 0
    5460719   3 3.9 0 1
    5460722 3.8 3.8 1 0
    5460735 3.9 3.6 1 0
    5460730 4.1 3.8 0 1
    5460710 3.5   3 0 1
    5460716 2.9 3.3 1 0
    5460737 3.7   3 1 1
    5460734 3.9 3.4 0 0
    5460741 4.3 3.8 1 0
    5460753 4.8 3.4 0 1
    5460789 3.3 2.7 0 1
    5460635 4.1 3.6 1 1
    5460638 3.8 2.9 1 1
    5460640 3.2 1.3 0 1
    5460746 3.9   3 1 1
    5460777 3.6 3.3 0 0
    5460778 3.5 2.7 1 0
    5460650 2.6 3.5 0 0
    5460644 2.8 3.1 1 1
    5460651 2.7 3.1 1 0
    5460649 2.1 3.9 0 0
    5460661 4.4 2.6 1 0
    5460660   3 4.4 1 0
    5460647 3.2 3.5 1 1
    5460671   4 4.3 0 1
    5460652   4 2.2 0 1
    end
    Last edited by Nitish Upadhyaya; 23 Jan 2022, 05:24.

  • #2
    stripplot is from SSC as you are asked to explain (FAQ Advice #12).

    Thanks for the data example. I have various small and large points.

    jitter() is legal but a bad idea together with the cumul option. It undoes much of the good that cumul does in segregating values.

    Never mess with the width of any median, quartiles box when cumul cumpr is applied with box -- as the box width has concrete meaning as stretching from cumulative probability 0.25 to cumulative probability 0.75. That's the point: a quantile plot and a box plot are compatible because the median and quartiles can be shown on both and because the horizontal scale of the quantile plot is cumulative probability.

    Otherwise here's some technique. Wanting to show everything at once seems ambitious to me.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id float(openness agreeableness condition responsedummy)
    5460439 2.5 3.3 0 1
    5460448 2.6 2.6 1 1
    5460477 2.6   2 0 1
    5460525 3.5 2.4 0 1
    5460546 3.3 3.2 1 1
    5460586 4.6 4.2 1 1
    5460576 3.5   3 1 1
    5460556 3.9 2.9 0 1
    5460553 2.5 4.1 0 1
    5460558 2.7 2.5 1 1
    5460550 4.3   3 0 0
    5460557 3.5 2.8 1 0
    5460540 3.9 2.8 0 0
    5460521 3.9 2.3 1 0
    5460541 4.4 3.1 0 0
    5460520 3.7 2.4 0 0
    5460522 2.8 3.1 0 1
    5460517 4.6 4.1 1 0
    5460524 2.8 2.5 1 1
    5460518 2.6 3.1 1 0
    5460624 2.5 3.4 1 0
    5460627   4 3.4 0 0
    5460516 3.9 2.3 1 0
    5460505 3.8 3.3 0 0
    5460503 2.9 3.7 1 0
    5460506 4.4 3.4 0 1
    5460502 2.7 2.5 1 1
    5460491 4.2 3.6 1 0
    5460658 2.7   4 1 1
    5460495 4.6 3.9 1 0
    5460484   4 4.2 0 1
    5460482 3.4 2.9 0 0
    5460485 3.1 3.7 1 1
    5460483 3.2 3.3 0 1
    5460471 3.3 2.9 1 1
    5460469 3.3 2.8 0 0
    5460459   3 3.8 0 1
    5460449 3.6 3.5 1 1
    5460442 3.2 2.9 1 1
    5460691 4.4 2.9 0 1
    5460438   3 2.7 1 0
    5460656 4.6 3.9 0 0
    5460435 3.9 3.9 0 1
    5460715 3.5 3.6 1 0
    5460433 3.8 2.6 0 1
    5460709 2.6 4.6 0 1
    5460431 3.5 3.9 1 0
    5460589 3.7 2.8 1 0
    5460569 3.5 3.5 0 1
    5460600 3.8 3.5 1 1
    5460580 4.1 3.2 0 0
    5460590 3.7 2.3 1 1
    5460551 3.1 3.3 1 0
    5460598 4.2 4.5 1 1
    5460593 4.3 3.4 1 0
    5460584 3.3 2.8 0 1
    5460578 4.2 3.3 0 1
    5460564 3.2 1.6 1 0
    5460574 3.9 2.9 0 0
    5460571 3.9 2.6 1 1
    5460555 4.1 3.6 1 0
    5460563 4.5 2.9 0 1
    5460607   4 3.8 0 0
    5460605   3 2.8 0 0
    5460601 2.9 2.4 0 1
    5460616 3.6 3.8 0 1
    5460623 3.4 3.2 0 1
    5460625 3.8 3.3 0 1
    5460615 4.5 3.2 0 1
    5460642 3.5 3.2 1 0
    5460626 4.1 2.5 1 1
    5460695   3 3.5 1 0
    5460721 4.3 4.3 1 1
    5460717 4.2 2.8 1 0
    5460719   3 3.9 0 1
    5460722 3.8 3.8 1 0
    5460735 3.9 3.6 1 0
    5460730 4.1 3.8 0 1
    5460710 3.5   3 0 1
    5460716 2.9 3.3 1 0
    5460737 3.7   3 1 1
    5460734 3.9 3.4 0 0
    5460741 4.3 3.8 1 0
    5460753 4.8 3.4 0 1
    5460789 3.3 2.7 0 1
    5460635 4.1 3.6 1 1
    5460638 3.8 2.9 1 1
    5460640 3.2 1.3 0 1
    5460746 3.9   3 1 1
    5460777 3.6 3.3 0 0
    5460778 3.5 2.7 1 0
    5460650 2.6 3.5 0 0
    5460644 2.8 3.1 1 1
    5460651 2.7 3.1 1 0
    5460649 2.1 3.9 0 0
    5460661 4.4 2.6 1 0
    5460660   3 4.4 1 0
    5460647 3.2 3.5 1 1
    5460671   4 4.3 0 1
    5460652   4 2.2 0 1
    end
    
    set scheme s1color 
    
    local opts vertical yla(, ang(h)) mc(black) ms(Sh)
    
    stripplot openness, over(responsedummy) box(blcolor(blue)) centre cumul cumpr   `opts' name(G1, replace)
    
    stripplot openness, over(responsedummy)  box(blcolor(blue)  barw(0.06)) pctile(0) whiskers(lc(blue)) boffset(-0.1) `opts' stack h(0.4) name(G2, replace)
    
    
    local toshow 
    
    foreach v in openness agreeableness { 
        separate `v', by(responsedummy) veryshortlabel 
        local toshow `toshow' `r(varlist)'
    }
    
    stripplot `toshow', box(blcolor(blue)) centre cumul cumpr   `opts' name(G3, replace) xaxis(1 2) ///
    xla(1 "0" 2 "1" 3 "0" 4 "1", tlcolor(bg) axis(2)) xla(1.5 "openness" 3.5 "agreeableness", tlcolor(bg) axis(1)) ///
    xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("") 
    
    *  need better value labels! 
    label def condition 0 "condition 0" 1 "condition 1"
    label val condition condition 
    
    stripplot `toshow', box(blcolor(blue)) by(condition, note("")) centre cumul cumpr   `opts' name(G4, replace) xaxis(1 2) ///
    xla(1 "0" 2 "1" 3 "0" 4 "1", tlcolor(bg) axis(2)) xla(1.5 "openness" 3.5 "agreeableness", tlcolor(bg) axis(1)) ///
    xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("") subtitle(, nobox fcolor(none))

    Comment


    • #3
      Hi Nick Cox

      Thank you so much for taking the time to response in such a detailed manner. I totally understand your points on jitter() and being careful with widths when cumul cumpr is applied. I also think you are right that trying to show everything at once is a bit too ambitious, especially as the reader will likely be overloaded. As such, I am going to go for the slightly simpler stripplots example you gave for each individual trait. I'd taken the clearer value labels out for posting on the forum.

      I haven't, however, been able to make the below work for me - I keep getting a "local not found error" which I haven't been able to puzzle out. I know it'll be something silly I am doing on my end(see extra from stata below). Would be good to use this opportunity to learn for next time.

      Originally posted by Nick Cox View Post

      Code:
      local toshow
      
      foreach v in openness agreeableness {
      separate `v', by(responsedummy) veryshortlabel
      local toshow `toshow' `r(varlist)'
      }
      
      stripplot `toshow', box(blcolor(blue)) centre cumul cumpr `opts' name(G3, replace) xaxis(1 2) ///
      xla(1 "0" 2 "1" 3 "0" 4 "1", tlcolor(bg) axis(2)) xla(1.5 "openness" 3.5 "agreeableness", tlcolor(bg) axis(1)) ///
      xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("")
      
      * need better value labels!
      label def condition 0 "condition 0" 1 "condition 1"
      label val condition condition
      
      stripplot `toshow', box(blcolor(blue)) by(condition, note("")) centre cumul cumpr `opts' name(G4, replace) xaxis(1 2) ///
      xla(1 "0" 2 "1" 3 "0" 4 "1", tlcolor(bg) axis(2)) xla(1.5 "openness" 3.5 "agreeableness", tlcolor(bg) axis(1)) ///
      xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("") subtitle(, nobox fcolor(none))
      Stata output when running highlighted portion of above

      Code:
      . local toshow
      
      .
      . foreach v in openness agreeableness {
        2.     separate `v', by(responsedummy) veryshortlabel
        3.     local toshow `toshow' `r(varlist)'
        4. }
      
                    storage   display    value
      variable name   type    format     label      variable label
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      openness0       float   %9.0g                 0
      openness1       float   %9.0g                 1
      
                    storage   display    value
      variable name   type    format     label      variable label
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      agreeableness0  float   %9.0g                 0
      agreeableness1  float   %9.0g                 1
      
      .
      . stripplot `toshow', box(blcolor(blue)) centre cumul cumpr   `opts' name(G3, replace) xaxis(1 2) ///
      > xla(1 "0" 2 "1" 3 "0" 4 "1", tlcolor(bg) axis(2)) xla(1.5 "openness" 3.5 "agreeableness", tlcolor(bg) axis(1)) ///
      > xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("")
      local not found
      r(111);

      Comment


      • #4
        See https://www.stata-journal.com/articl...article=dm0102 on local macros. My guess is that you need to run the script as a whole, not one line of code at a time from the Do-file Editor window.

        Comment


        • #5
          Cheers Nick Cox - it isn't quite working for me but I've decided, as you flagged, to pare back what I am trying to show in each diagram and therefore your guidance on the initial piece has come in very useful.

          Comment


          • #6
            The main idea is that the script should go in the do-file Editor window and then you run the entirety of the code at once.

            It does work! Here for example is the last graph from #2. My concern is just whether if you show 4 variables not just openness and agreeableness, it gets too crowded.




            The code would be something like

            Code:
            local toshow
            
            foreach v in openness agreeableness extraversion honestyhumility {      
                separate `v', by(responsedummy) veryshortlabel      
                local toshow `toshow' `r(varlist)' 
            }  
            
            stripplot `toshow', box(blcolor(blue)) centre cumul cumpr   `opts' name(G5, replace) xaxis(1 2) /// 
            xla(1 "0" 2 "1" 3 "0" 4 "1" 5 "0" 6 "1" 7 "0" 8 "1", tlcolor(bg) axis(2)) ///
            xla(1.5 "openness" 3.5 "agreeableness" 5.5 "extraversion" 7.5 "honesty-humility", tlcolor(bg) axis(1)) /// 
            xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("")


            and so on.

            Comment


            • #7
              Cheers Nick Cox for coming back on this. I had been running the entirety of the code at one (and just tried your additiona suggestion) but I am still getting a "local not found" error. I clearly have something set up wrong on my end - any ideas so I can bottom this out for the future? And I couldn't see the screen capture in your above post either. Really appreciate your patience with this!

              Comment


              • #8
                Show the result of

                Code:
                which stripplot

                Before the stripplot call put


                Code:
                set trace on
                and then show us a chunk of the code trace before the command fails.

                Comment


                • #9
                  Nick,
                  I can reproduce the error:

                  Code:
                  . which stripplot
                  *! 2.9.0 NJC 10 July 2021
                  Code:
                   if r(N) {
                        local s = cond(r(N)>1,"s","")
                        local N : di %11.0fc r(N)
                        local N `N'
                        di in bl "(`N' missing value`s' generated)"
                        }
                        }
                      - rename `dummy' `name'
                      = rename __00000E __000006
                      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- end egen ---
                    - if `"`tufte'`tufte2'"' != local pctile = 0
                    = if `""' != local pctile = 0
                  local not found
                      if "`iqr'`iqr2'" != "" {
                      egen `upper' = max(cond(`data' <= `upq' + `mult' * (`upq' - `loq'), `data', .)), by(`by' _stack)
                      egen `lower' = min(cond(`data' >= `loq' - `mult'
                  Code:
                      }
                      if "`cumulate'" != "" {
                      tempvar count negstack
                      gen `negstack' = -_stack
                      sort `by' `negstack' `data' `separate', stable
                      if "`cumprob'" != "" {
                      by `by' `negstack' : gen `count' = (_n - 0.5)/_N
                      }
                      else by `by' `negstack' : gen `count' = _n
                      su `count', meanonly
                      if "`centre'`center'" != "" {
                      if "`cumprob'" != "" {
                      by `by' `negstack' : replace `count' = `count' - 0.5
                      }
                      else by `by' `negstack' : replace `count' = _n - (_N + 1)/2
                      }
                      replace _stack = _stack + `height' * `count' / r(max)
                      }
                      local which "`copystack'"
                      }
                      }
                    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ end stripplot ---
                  r(111);
                  
                  end of do-file
                  Stata 17

                  Eddy

                  Comment


                  • #10
                    Thanks very much to :@Eddy SImms. That's very helpful. You've been bitten by a bug in stripplot 2.9.0 -- and I've been able to work out why it did not bite me. Sorry about that.

                    The code should be

                    Code:
                     
                     if `"`tufte'`tufte2'"' != "" local pctile = 0
                    if you are willing to edit the .ado file.

                    Otherwise If you want to use a private message on Statalist revealing an email I can send you revised code directly. Otherwise a revised copy will be posted on SSC, at Kit Baum's convenience.
                    Last edited by Nick Cox; 30 Jan 2022, 10:31.

                    Comment


                    • #11
                      Thank you very much. Edited the code and everything works.

                      Eddy

                      Comment

                      Working...
                      X