Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with range of ylabel when using forval to define labels

    Hi,

    I am generating multiple scatter plots with a logarithmic y-axis using foreach loops, however I am having problems with getting the ylabels look nice. I found Nick Cox's "forval" code (https://www.stata.com/statalist/arch.../msg00770.html), and it partially works, but it creates a massive range from min to max, and I'm not sure why so. I am running Stata15 on macOS.

    My y-values are various protein measurements ranging from 0.0001 units to 10000 units (variables differ in range; these given here are the absolute min and max); but rather I get labelling of log10 units ranging from 10E-6 to 10E9..

    My code looks like this:


    Code:
    foreach y in varlist" {
    
    display "statistics for: `y', Gene 1"
    kwallis `y' if diaggroup==1, by(gene1mut)
    
    display "statistics for: `y', Gene 2"
    kwallis `y' if diaggroup==1, by(gene2mut)
    
    su `y', meanonly
    local min = ceil(log(r(min)))
    local max = floor(log(r(max)))
    
    forval i = `min'/`max' {
    local labels "`labels' `=10^(`i')'"
    }
    
    di " `labels' -"
    twoway (scatter `y' mut_status if mut_status==0 & diaggroup==1, jitter(3)) (scatter `y' mut_status if mut_status==1 & diaggroup==1, jitter(3)) (scatter `y' mut_status if mut_status==2 & diaggroup==1, jitter(3)), name(`y', replace) xscale(range(-0.5 2.5)) xlabel(0 "No mutation" 1 "Gene 1" 2 "Gene 2") yscale(log) ylabel(`labels')
    
    }
    the
    Code:
    di " `labels' -"
    part is just to see what `labels' contains. For one variable with min-max being 150-13106, it returns "1000000 10000000 100000000 1000000000 1.00000000000e-06 .00001 .0001 .001 .01 .1 1".

    I don't know how to solve this. Can you help? It's my first post here, apologies if I did not include all necessary info.

    Best,
    Mads

  • #2
    You are getting confused between logarithms to base 10 and natural logarithms, I guess.

    You're citing a post of mine from 2003. Since then I've picked up the general problem a few times and put it down again, although the latest bout was the most substantial and resulted in a command niceloglabels -- which can be installed freely from the Stata Journal website (ignore the SSC version) -- and a paper which at present requires subscription access.

    Here are what that command suggests for your range:

    Code:
    . niceloglabels  150 13106, local(myla) style(1)
    1000 10000
    
    . niceloglabels  150 13106, local(myla) style(125)
    200 500 1000 2000 5000 10000
    The deal is that you have to say what is "nice" by specifying a style. The command doesn't have a fixed idea on that.

    Comment


    • #3
      Thanks Nick. You are right.
      I did do some searching before posting here, but first heard of the niceloglabel command now from you. It solved my problem right away. Thank you so much.

      Comment


      • #4
        Ok, so saying that it solved my problem right away was 99.5% true. Not that it keeps me from getting the work done, but just to provide some feedback I thought I would post an occasional curiosity that comes from the niceloglabel command.

        The range for way most of my figures looks really nice now, but a few of them get labelling well outside of the variable range and even outside of the axis. Not sure why this happens. See attached figure.

        I suppose I would have to change the local macro post-hoc to not hold values smaller or larger than the min-max range of my variable. Will probably look into that some other day.

        Again, thanks for pointing in direction of niceloglabel.


        Click image for larger version

Name:	ALSpatientsgenderkwallistnfap.png
Views:	1
Size:	96.2 KB
ID:	1468618

        Comment


        • #5
          Please post the minimum and maximum of your response variable and what your exact command was using niceloglabels. There is a hint in the help that the range is widened as a fudge to avoid precision problems.
          Last edited by Nick Cox; 02 Nov 2018, 04:30.

          Comment


          • #6
            min=1.1351; max= 430.4291

            The full code is:

            Code:
            foreach y in "agesampling" "ageonset" "studyperiod" freezerstorage" "tnfap" "il6p" "il17ap" "{
            
            display "statistics for: `y', males versus females" 
            kwallis `y' if diaggroup==1, by(gender)
            bysort gender: tabstat `y' if diaggroup==1, statistics(N min q max)
            
            niceloglabels `y', local(labels) style(125)
            twoway (scatter `y' gender if gender==0 & diaggroup==1, jitter(3)) (scatter `y' gender if gender==1 & diaggroup==1, jitter(3)) , name("Gender`y'", replace) xscale(range(-0.5 1.5)) xlabel(0 "Males" 1 "Females") yscale(log) ylabel(`labels', angle(20)) legend(off) xtitle("")
            }
            I did see the note, but what it means is blurry to me. Perhaps it is clear for native English speakers? Not ruling out a linguistic barrier here, sorry.

            Comment


            • #7
              You're asking for labels regardless of diaggroup and then applying them to data only for diaggroup == 1. So presumably you have, outside the data plotted, some lower values of the response. Note that

              Code:
              niceloglabels 1.1351 430.4291, style(125) local(labels)
              suggests 2 5 10 20 50 100 200. If you want to choose an axis range that's much wider than the data, you need something like ysc(r(0.5 .)) as an extra option.

              Don't worry about the fudging. It seems that the issue is just asking for labels in terms of a wider group than you're plotting.

              Comment


              • #8
                Of course, you're right. Yes, I am working with a cohort of patients with different diagnoses, but I want to zoom in on one particular disease, hence diaggroup == 1. Including the if-statement in the niceloglabels solves it:

                Code:
                niceloglabels `y' if diaggroup==1, local(labels) style(125)


                ​​​​​​​Thanks for the help. Very much appreciated.

                Comment


                • #9
                  Plotting marginal effects of interaction terms for very small values of dependent variable - how to scale the y axis?

                  Not sure if my questions needs a new post. So I am posting it here.I am trying to plot the average marginal effect for an interaction term between a categorical and continuous variable. The categorical variable (CA) has six levels. The raw dependent variable takes on very small measuring to seventh decimal place. I use a fractional logit model using fracreg logit command to get the results below:

                  Then I use the margins command to generate the average marginal effects.

                  margins, dydx(CA) at(mean=(-20(5)14)) vsquish

                  The results are included below.

                  As you can see there are statistically significant marginal affects across the different levels of the categorical variable. Notice that the dependent variables takes on the small values.

                  When I try to plot this using the marginsplot, yline(0) command, all the estimated marginal effects for all levels fall on the zero line. I am guessing that's because the y-axis is too large and needs to be shrunk. How do I do that?

                  I tried using yscale(r(-0.000009,0.0000009) and get "command yscale is unrecognized". I would greatly appreciate any help with getting the plots.


                  Comment


                  • #10
                    Click image for larger version

Name:	image_14674.png
Views:	1
Size:	228.5 KB
ID:	1500805


                    I tried ysc as a option with the margins command. it did not work either. I've played around with the "at" option restricting it to (0(1)5) ; (0(1)10) etc.. This has helped zoom into the y-axis but I am still getting the resulting graph where I see the results only for category 6 of CA.



                    Last edited by Jake Ed; 29 May 2019, 16:30.

                    Comment


                    • #11
                      I pointed out at https://stats.stackexchange.com/ques...nt-variable-th that the reason for the error message "command ... not recognized" is evidently that you tried to issue yscale() separately. So no one needs to explain that again.

                      The generic principle here is that axis scale choice is dominated by the maximum and minimum to be shown. To get a more detailed view of how the marginal effects are varying I think you'll need to export the results as data and then use your own scale, say some variant on sign(effect) * log(1 + abs(effect)) or asinh(fudge factor * effect).


                      Comment

                      Working...
                      X