Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sparkline with line breaks

    Hello,

    I have a question regarding Sparkline by Nick Cox

    I am trying to make the plot and would like to see gaps where data is missing for dates. However, when i try using the cmissing(n) option, it would not work. Data for it as follows:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(mth pct_Dist_obsm1 pct_Dist_obsm2 pct_Dist_obsm3 pct_Dist_obsm4)
    709  41.09735 14.581884  14.33607 29.984695
    710  20.50663  23.56221   11.3355  44.59566
    711 31.557364 35.394505 10.668147 22.379984
    712 33.949966 22.997416  11.68076  31.37186
    713  29.22156  53.09381   4.91018  12.77445
    714  36.69355  31.38441 13.239247 18.682796
    715         .         .         .         .
    716  31.59121  47.27031 11.617843  9.520639
    717 33.119682   41.9415  6.957231 17.981586
    718    25.508 15.890158 18.045576  40.55627
    719  41.26322  20.69178 12.274025 25.770975
    720  25.46957  22.52475 12.605562  39.40012
    721 33.942047   25.3424  11.94447  28.77108
    722         .         .         .         .
    723         .         .         .         .
    724         .         .         .         .
    725         .         .         .         .
    726         .         .         .         .
    727         .         .         .         .
    728 21.392765 33.484325  13.04584  32.07707
    729  41.59512  20.48501 13.277602 24.642265
    730 36.606453  19.53065  12.05805  31.80485
    731 30.773497  42.40366  12.18464 14.638204
    732 38.100067   33.1643 10.141988 18.593645
    733  24.38507  13.69805 14.291773  47.62511
    734         .         .         .         .
    735 34.950703 14.799165 18.471027 31.779106
    736  32.90823 26.678156  13.36493 27.048685
    737  29.32084 25.089773  12.47463 33.114754
    738  27.54687 14.353735 17.438745  40.66065
    739  32.05805 21.405014  12.05475  34.48219
    740  20.69971 23.790087 18.381924  37.12828
    741 29.435526  25.36409   12.8415 32.358883
    742  24.72497 12.734748 16.724081   45.8162
    743   26.7293 31.071676  9.638135  32.56089
    end
    This is the final data above that needs to be made using Sparkline

    Can anyone help me figure out how to get this done? or is there a way i can make Sparkline using twoway line



    I tried using the following example code for making a sparkline type plot with line breaks:

    Code:
        generate pct_Dist_obsm2_2 = pct_Dist_obsm2 + 50
    
        generate pct_Dist_obsm3_2 = pct_Dist_obsm3 + 110
        
        generate pct_Dist_obsm4_2 = pct_Dist_obsm4 + 150
        
        twoway (line pct_Dist_obsm1 mth, cmissing(n)) (line pct_Dist_obsm2_2 mth, cmissing(n)) (line pct_Dist_obsm3_2 mth, cmissing(n)) (line pct_Dist_obsm4_2 mth, cmissing(n))
    Will be great if someone can help out

    Thank you.

    P.S. as a suggestion to Nick Cox, it will be awesome if this command can allow for adding plots in some way

  • #2
    Thanks for the interesting question and data example.

    When people (from Edward Tufte on) show sparkline examples they often look great. Who would choose poor examples? For many datasets -- in my experience -- they often turn out disappointing.


    sparkline is from SSC (2013), as you are asked to explain (FAQ Advice #12). I've not used it much since I wrote it. Somewhere in the middle of the code I drop missing values temporarily for some reason, which is why cmissing(n) is legal but changes nothing. There is possibly a rewriting of the code that gets you what you asked for. You're welcome to clone and rewrite it under a different name. But the way it is written essentially rules out anything like addplot(). It's rearranging data in a space designed for the purpose, and no other purpose is compatible.

    However, I didn't find sparkline helpful for these variables, even with that limitation. They are evidently components that add to 100% and so arguably should be presented on the same scale.

    This seems to me to be a better representation which is honest about the gaps, and there will be others as good or better. This is what I tend to do instead, rely heavily on a by() option to do most of the work.

    For the tiny trickery with year labels, see https://www.stata-journal.com/articl...article=gr0030

    For the general idea of using by() here see https://www.stata-journal.com/articl...article=gr0085

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(mth pct_Dist_obsm1 pct_Dist_obsm2 pct_Dist_obsm3 pct_Dist_obsm4)
    709  41.09735 14.581884  14.33607 29.984695
    710  20.50663  23.56221   11.3355  44.59566
    711 31.557364 35.394505 10.668147 22.379984
    712 33.949966 22.997416  11.68076  31.37186
    713  29.22156  53.09381   4.91018  12.77445
    714  36.69355  31.38441 13.239247 18.682796
    715         .         .         .         .
    716  31.59121  47.27031 11.617843  9.520639
    717 33.119682   41.9415  6.957231 17.981586
    718    25.508 15.890158 18.045576  40.55627
    719  41.26322  20.69178 12.274025 25.770975
    720  25.46957  22.52475 12.605562  39.40012
    721 33.942047   25.3424  11.94447  28.77108
    722         .         .         .         .
    723         .         .         .         .
    724         .         .         .         .
    725         .         .         .         .
    726         .         .         .         .
    727         .         .         .         .
    728 21.392765 33.484325  13.04584  32.07707
    729  41.59512  20.48501 13.277602 24.642265
    730 36.606453  19.53065  12.05805  31.80485
    731 30.773497  42.40366  12.18464 14.638204
    732 38.100067   33.1643 10.141988 18.593645
    733  24.38507  13.69805 14.291773  47.62511
    734         .         .         .         .
    735 34.950703 14.799165 18.471027 31.779106
    736  32.90823 26.678156  13.36493 27.048685
    737  29.32084 25.089773  12.47463 33.114754
    738  27.54687 14.353735 17.438745  40.66065
    739  32.05805 21.405014  12.05475  34.48219
    740  20.69971 23.790087 18.381924  37.12828
    741 29.435526  25.36409   12.8415 32.358883
    742  24.72497 12.734748 16.724081   45.8162
    743   26.7293 31.071676  9.638135  32.56089
    end
    
    reshape long pct_Dist_obsm, i(mth) j(which)
    
    rename pct_Dist_obsm pct 
    
    separate pct, by(which) veryshortlabel 
    
    set scheme s1color 
    
    twoway bar pct? mth, by(which, compact col(1) note("") legend(off) r1title(some story here)) ytitle(pct of whatever) yla(0(10)50, ang(h)) xtick(708.5(12)744.5, tlen(*4)) xla(714.5 "2019" 726.5 "2020"   738.5 "2021", tlc(none)) xtitle("") subtitle(, pos(3) size(*1.3) nobox nobexpand fcolor(none)) blcolor(red blue black magenta) bfcolor(red*0.2 blue*0.2 black*0.2 magenta*0.2)

    Click image for larger version

Name:	sparkline_not.png
Views:	1
Size:	27.2 KB
ID:	1653404


    Comment


    • #3
      Naturally a line or connected version of this is possible.

      Code:
      twoway connected  pct? mth, cmissing(n n n n) by(which, compact col(1) note("") legend(off) r1title(some story here)) ytitle(pct of whatever) yla(0(10)50, ang(h)) xtick(708.5(12)744.5, tlen(*4)) xla(714.5 "2019" 726.5 "2020"   738.5 "2021", tlc(none)) xtitle("") subtitle(, pos(3) size(*1.3) nobox nobexpand fcolor(none)) mcolor(red blue black magenta) lcolor(red blue black magenta)
      Code:
      
      



      Click image for larger version

Name:	sparkline_not2.png
Views:	1
Size:	35.5 KB
ID:	1653444


      Comment


      • #4
        You could get to #4 with multiline from SSC. https://www.statalist.org/forums/for...ailable-on-ssc

        Code:
        set scheme s1color 
        
        forval j = 1/4 { 
            label var pct_Dist_obsm`j' "`j'"
        }
        
        multiline pct* mth, missing cmissing(n n n n) recast(connect) by(compact col(1) note("") legend(off) r1title(some story here)) ytitle(pct of whatever) yla(0(10)50, ang(h)) xtick(708.5(12)744.5, tlen(*4)) xla(714.5 "2019" 726.5 "2020"   738.5 "2021", tlc(none)) xtitle("") subtitle(, pos(3) size(*1.3) nobox nobexpand fcolor(none)) mcolor(red blue black magenta) lcolor(red blue black magenta) separate

        Comment


        • #5
          These are awesome examples! I like how the bar version depicts information much better than the line. Also thank you for sharing the use of by(). It was something new to me that I can rearrange plots with it.

          Comment


          • #6
            Glad it helped. My conclusions:

            1. For your data, a bar chart has the edge, given their compositional character.

            2. sparkline from SSC is there for experiment and if you like the results, that’s good. I am much more likely to maintain and develop multiline, also from SSC.

            Last edited by Nick Cox; 09 Mar 2022, 02:51.

            Comment


            • #7
              This gets the bars directly with multiline (SSC).

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float(mth pct_Dist_obsm1 pct_Dist_obsm2 pct_Dist_obsm3 pct_Dist_obsm4)
              709  41.09735 14.581884  14.33607 29.984695
              710  20.50663  23.56221   11.3355  44.59566
              711 31.557364 35.394505 10.668147 22.379984
              712 33.949966 22.997416  11.68076  31.37186
              713  29.22156  53.09381   4.91018  12.77445
              714  36.69355  31.38441 13.239247 18.682796
              715         .         .         .         .
              716  31.59121  47.27031 11.617843  9.520639
              717 33.119682   41.9415  6.957231 17.981586
              718    25.508 15.890158 18.045576  40.55627
              719  41.26322  20.69178 12.274025 25.770975
              720  25.46957  22.52475 12.605562  39.40012
              721 33.942047   25.3424  11.94447  28.77108
              722         .         .         .         .
              723         .         .         .         .
              724         .         .         .         .
              725         .         .         .         .
              726         .         .         .         .
              727         .         .         .         .
              728 21.392765 33.484325  13.04584  32.07707
              729  41.59512  20.48501 13.277602 24.642265
              730 36.606453  19.53065  12.05805  31.80485
              731 30.773497  42.40366  12.18464 14.638204
              732 38.100067   33.1643 10.141988 18.593645
              733  24.38507  13.69805 14.291773  47.62511
              734         .         .         .         .
              735 34.950703 14.799165 18.471027 31.779106
              736  32.90823 26.678156  13.36493 27.048685
              737  29.32084 25.089773  12.47463 33.114754
              738  27.54687 14.353735 17.438745  40.66065
              739  32.05805 21.405014  12.05475  34.48219
              740  20.69971 23.790087 18.381924  37.12828
              741 29.435526  25.36409   12.8415 32.358883
              742  24.72497 12.734748 16.724081   45.8162
              743   26.7293 31.071676  9.638135  32.56089
              end
              
              set scheme s1color 
              
              forval j = 1/4 { 
                  label var pct_Dist_obsm`j' "`j'"
              }
              
              multiline pct* mth, missing cmissing(n n n n) recast(bar) by(compact col(1) note("") legend(off) r1title(some story here)) ytitle(pct of whatever) yla(0(10)50, ang(h)) xtick(708.5(12)744.5, tlen(*4)) xla(714.5 "2019" 726.5 "2020"   738.5 "2021", tlc(none)) xtitle("") subtitle(, pos(3) size(*1.3) nobox nobexpand fcolor(none)) blcolor(red blue black magenta) bfcolor(red*0.2 blue*0.2 black*0.2 magenta*0.2) separate

              Comment


              • #8
                Was not aware of multiline previously and thank you for sharing this will definitely be exploring this more!

                Comment

                Working...
                X