Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing variables with extreme values

    Hi Statalist.

    I want to graph a number of financial variables, such as total household assets, and compare the values between a few dichotomous and categorical variables (e.g. race, religion, etc). I graph using a 95% confidence band for each to show the range of values and overlay that with the average of each of the same variables. The financial variable is on the y axis and age is on the x axis so I can observe the change in these values over the lifecycle (by group).

    As you can see my dataset contains extreme values and I am looking for options on how best to 'deal' with these for graphing purposes - without excluding any of the observations as I do not want to artificially affect the mean values.

    Click image for larger version

Name:	assets_graph_eg2.png
Views:	1
Size:	24.7 KB
ID:	1600633


    Code:
    tw (lpolyci totasset hgage1 if group == 1 & wave == 2, bwidth(3) lc("230 76 138") lw(medthick) ciplot(rarea) acolor("230 76 138%30") alw(5) level(95)) /// 
        (lpolyci totasset hgage1 if group == 2 & wave == 2, bwidth(3) lc("25 154 222") lw(medthick) ciplot(rarea) acolor("25 154 222%30") alw(none) level(95)) /// 
        (connect totassetave hgage1 if group == 1 & wave == 2, lc("230 76 138%70") lwidth(medthin) lpattern(shortdash) m(oh) mlw(vthin) mc("230 76 138%90"))  /// 
        (connect totassetave hgage1 if group == 2 & wave == 2, lc("25 154 222%10") lwidth(medthin) lpattern(shortdash) m(oh) mlw(vthin) mc("25 154 222%90")), ///
        title("Wave 2", size(medsmall) position(11) justification(right)) ///
        legend(region(lstyle(none)) order(2 "type A" 4 "type B") col(2) pos(0) ring(1) bplace(ne) rowgap(.1) colgap(1) size(small) color(none) region(fcolor(none))) /// angle(h) 
        ytitle("Total assets", size(small)) xtitle("Age", size(small)) ///    
        xla(20(10)100, format(%8.0fc) labsize(vsmall)) xtick(20(10)100) xmtick(15(10)95) ///
        yla(0(400000)2000000, format(%10.0fc) labsize(vsmall)) ytick(0(400000)2000000) ymtick(200000(400000)2000000, grid nogmin gex glc(gs12) glp(dot) glw(medthin))  ytick(0(.1).5) ///
        plotr(margin(zero) lw(medthin)) scheme(burd) name("Fig4", replace) scale(1.2)
    Comments on options and code appreciated. (Note my draft code is copied from elsewhere and amended to suit.)

    One option is to take the natural log of these values (after applying this change to the first five lines of code) I obtain this graph - still with extreme values. (Note - there are no negative values). I believe using -yscale(log)- and/or -ylabels- will help here but I have not yet worked out how to code such that the values are in $ terms. Any suggestions here appreciated.

    Click image for larger version

Name:	assets_graph_eg.png
Views:	2
Size:	23.6 KB
ID:	1600634


    Stata v.15.1. I am using panel data. This post has its roots at #11-#15 here https://www.statalist.org/forums/for...-loop-question - though 'morphed' from the original thread title hence reposting.











  • #2
    Hi Nick Cox. In "Speaking Stata: Logarithmic Binning and Labeling" you explain, among other things, how to use the -niceloglabels- command. In a previous post, you suggested I use -mylabels- (I'm not sure of their differences in terms of their application to my problem), so I tried applying both to my problem, including your suggestion in #13 here. When using -mylabels- (as below) I received this message:
    myscale(ln(@)) is not a twoway plot type
    Code:
    tw (lpolyci lntotasset hgage1 if group == 1 & wave == 2, bwidth(3) lc("230 76 138") lw(medthick) ciplot(rarea) acolor("230 76 138%30") alw(5) level(95)) /// (where lntotasset = log of household assets)
        (lpolyci lntotasset hgage1 if group == 2 & wave == 2, bwidth(3) lc("25 154 222") lw(medthick) ciplot(rarea) acolor("25 154 222%30") alw(none) level(95)) ///
        (connect lntotassetave hgage1 if group == 1 & wave == 2, lc("230 76 138%70") lw(medthin) lp(shortdash) m(oh) mlw(vthin) mc("230 76 138%90"))  ///
        (connect lntotassetave hgage1 if group == 2 & wave == 2, lc("25 154 222%10") lw(medthin) lp(shortdash) m(oh) mlw(vthin) mc("25 154 222%90")), ///
        title("Wave 2", size(medsmall) position(11) justification(right)) ///    
        legend(region(lstyle(none)) order(2 "Group A" 4 "Group B") col(2) pos(0) ring(1) bplace(ne) rowgap(.1) colgap(1) size(small) color(none) region(fcolor(none))) ///
        ytitle("Total assets", size(small)) xtitle("Age", size(small)) ///    
        xla(20(10)100, format(%8.0fc) labsize(vsmall)) xtick(20(10)100) xmtick(15(10)95) ///, add tpo(o) tl(1) tlw(thin) tlc(gs5) tlsty(grid)) ///
        mylabels 3000 10000 30000 1e5 3e5 1e6, myscale(ln(@)) local(yla) 8.006367567650246 "3000" 9.210340371976184 "10000" 10.30895266064429 "30000" ///
        11.51292546497023 "100000" 12.61153775363834 "300000" 13.81551055796427 "1000000" ///
        plotregion(margin(zero) lw(medthin)) scheme(burd) name("Fig4", replace) scale(1.2)
    If I swap the code relating to -mylabels- with this:
    Code:
    niceloglabels totasset, style(1) local(yla) powers 3000 10000 30000 1e5 3e5 1e6 ///
    quantile totasset, ysc(log) yla(`yla', ang(h))) ///
    I receive this message:
    style(1) is not a twoway plot type
    r(198);
    Clearly I'm doing something wrong in both of these, in the latter, it appears to relate to the style type, however, I'm not sure what style type I need to select - there appears only a few on page 279 of the above linked paper, but if I select slyle 13 or 125, I receive the same message. I appreciate your help.
    Last edited by Chris Boulis; 01 Apr 2021, 05:24.

    Comment


    • #3
      You have continuation characters /// at the end of your graph command call. So graph twoway is trying to understand a reference to mylabels, or to nicelogiabels as the case may be, but each is a totally unconnected command.

      Other way round, mylabels and niceloglabels won't work unless issued as separate commands. They aren't somehow graph options.

      Further, each command must be issued before the graph command in which you want to use their results.

      Here is a self-contained example. The auto data uses non-metric measurements. Let's suppose we want to see metric. From a glance at the values for turn circle in feet we can guess that 10 to 15 metres could work nicely.

      Code:
      sysuse auto, clear
      scatter turn displacement
      mylabels 10/15, myscale(@/0.3048) local(yla)
      scatter turn displacement , yla(`yla') ytitle(Turn circle (m))
      which leaves displacement in cubic inches to be fixed similarly if that is the need.

      There is a circularity that understanding an error message is only straightforward if you understand the error. But it's not mylabels or niceloglabels that is objecting. It is graph twoway. Its logic I take to be that it doesn't recognise what you type as an option of its own command, but it is prepared to recognise what you type as a another plot type. But that does not work, so out you go.
      Last edited by Nick Cox; 01 Apr 2021, 05:49.

      Comment


      • #4
        Thank you Nick Cox. And I appreciate the example to help me understand your point. The good news is that the code is running. The bad news is the graph looks like a mess. Any suggestions on what I can do for it to be easier on the eyes?
        Code:
        mylabels 3000 10000 30000 1e5  3e5 1e6 3e6, myscale(ln(@)) local(yla)
        tw (lpolyci lntotasset hgage1 if group == 1 & wave == 2, bwidth(3) lc("230 76 138") lw(medthick) ciplot(rarea) acolor("230 76 138%30") alw(5) level(95)) ///
            (lpolyci lntotasset hgage1 if group == 2 & wave == 2, bwidth(3) lc("25 154 222") lw(medthick) ciplot(rarea) acolor("25 154 222%30") alw(none) level(95)) ///
            (connect lntotassetave hgage1 if group == 1 & wave == 2, lc("230 76 138%70") lw(medthin) lp(shortdash) m(oh) mlw(vthin) mc("230 76 138%90"))  ///
            (connect lntotassetave hgage1 if group == 2 & wave == 2, lc("25 154 222%10") lw(medthin) lp(shortdash) m(oh) mlw(vthin) mc("25 154 222%90")), ///
            title("Wave 2", size(medsmall) position(11) justification(right)) ///    
            legend(region(lstyle(none)) order(2 "Group A" 4 "Group B") col(2) pos(0) ring(1) bplace(ne) rowgap(.1) colgap(1) size(small) color(none) region(fcolor(none))) ///
            yla(`yla', format(%10.0fc) labsize(vsmall)) ytitle("Total assets", size(small)) ///    
            xla(20(10)100, format(%8.0fc) labsize(vsmall)) xtick(20(10)100) xmtick(15(10)95) xtitle("Age", size(small)) ///
            plotregion(margin(zero) lw(medthin)) scheme(burd) name("Fig4", replace) scale(1.2)
        One option is that I remove the extreme results (which includes the few people with no assets and the few with >$5m). Some dislike removing values as they believe that it messes with the 'true' average of the population, however, one could also argue that these very few extreme values distort the representative average of the population. Thoughts/suggestions?
        Click image for larger version

Name:	assets_graph_eg3.png
Views:	1
Size:	35.9 KB
ID:	1601155


        Stata v.15.1. Using panel data.

        Comment


        • #5
          I imagine you need a sort option in several places. As the help for line reminds us

          Do not forget to include the
          sort option when the data are not in the order of the x variable,
          Last edited by Nick Cox; 02 Apr 2021, 07:23.

          Comment


          • #6
            Thank you Nick Cox. I added sort after the comma in the first four lines of code (for the two way graph) and I no longer have a graph "that looks like the scribblings of a child" - one that resembles the 2nd graph in #1. Do you think this could be shown more effectively using a different graph style?
            Click image for larger version

Name:	assets_graph_eg4.png
Views:	1
Size:	23.9 KB
ID:	1601192



            Do you any thoughts on me potentially removing the very extreme values or at least the zero values?
            Last edited by Chris Boulis; 02 Apr 2021, 08:35.

            Comment

            Working...
            X