Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • LOWESS smoothing parameters

    Hello there I am trying to compare the differences between using

    Code:
    lowess yvar xvar
    Click image for larger version

Name:	lowess.PNG
Views:	1
Size:	49.5 KB
ID:	1723765

    vs

    Code:
    twoway(lowess yvar xvar), xlabel(0[2]40)
    Click image for larger version

Name:	lowess two way.PNG
Views:	1
Size:	30.2 KB
ID:	1723766

    I pretty much got the same graphs

    Can anyone point me in any direction what the difference is between twoway vs not using twoway?
    Agreed the graphs using twoway is cleaner without the data points, but it pretty much shows the same result with an increase in revision rate of mesh up to the number 20 (mean no of cases done by the surgeon), with an associated decrease in revision rate as the number of cases increases....

    Both graphs show the same thing, perhaps twoway is of course much clearer.

    QUESTION: However, what exactly is the difference? And can you point out a resource?

    I actually asked chatgpt - who answered that twoway lowess treat both variables as smoothing - which led me to Nick Cox article here Speaking Stata: Smoothing in Various Directions (sagepub.com)
    However, I still' don't understand smoothing in his article, perhaps as it's more about the program theroy....

    I continued to read which also led me to another resource who state 'smoothing parameters typically lie in the range of 0.25 to 0.5 for most LOESS applications' - how I supposed to know what is the degree of the smoothing parameter? - this resource goes on to say bandwith is interchangeable with smoothing parameter - which I don't really agree with.... as otherwise why would stata give an option of bandwith in lowess? this would make it onpar with twoway.....


    FYI - I understand that bandwidth is controlling how much of data points to use.

  • #2
    Code:
     twoway lowess yvar xvar || scatter yvar xvar

    Comment


    • #3
      Hi thanks for your answer especially as I'm sure you're busy but I'm not really asking for code, but what exactly is the difference between TWOWAY LOWESS vs just LOWESS.
      And can you point out a resource?

      Comment


      • #4
        Nothing that I can see. One has the scatter the other does not. Once you add the scatter, they look the same (at least in my data), since the scale of the y-axis is the same. Not sure whether they use different bw as default, but I doubt it.

        Comment


        • #5
          Try this. In my case, you get a few more observations at the low end of the range. That's probably a boundary thing.

          Code:
          sysuse auto, clear
          kdensity mpg , generate(x1 d1)
          twoway kdensity mpg || scatter d1 x1

          Comment


          • #6
            Code:
            clear all
            sysuse auto, clear
            
            kdensity mpg , generate(x1 d1) 
            local bw = r(bwidth)
            local ker = r(kernel)
            twoway kdensity mpg , bw(`bw') kernel("`ker'") boundary || scatter d1 x1

            Comment


            • #7
              don't think you need to force the bw/kernel as they appear to be the same.

              Code:
              clear all
              sysuse auto, clear
              
              kdensity mpg , generate(x1 d1)
              local bw = r(bwidth)
              local ker = r(kernel)
              twoway kdensity mpg ,boundary || scatter d1 x1

              Comment


              • #8
                They’re the same. The only difference is that the scale and aspect ratio of the graphs are different, but not the underlying values. ChatGPT is not helpful.

                Comment


                • #9
                  Thanks for the mention in #1. The paper you cite deserves a little more attention than it seems to have got but unfortunately it has no direct bearing on your question.

                  I don't see your data as likely to be much illuminated by smoothing. Your data -- I guess for good medical reasons -- are crowded into one corner of the space. There isn't much to go on outside that corner.

                  I can go a little beyond that visceral feeling.

                  I stopped using lowess in Stata some years ago because it has nowhere near the flexibility (literally and metaphorically) of lpoly. introduced later. Further, the implementation of lowess in Stata is a little idiosyncratic, which compounds an under-appreciated problem: over time lowess, as an algorithm or family of algorithms, has morphed and speciated when passed from hand to hand in different software outside Stata, in a history now over 40 years long.

                  In contrast, lpoly is a more nearly standard family of smoothers.

                  Be that as it may, here is one check that the two lowess commands are identical in result. In both cases the amounf of smoothing is just the default.

                  Code:
                  sysuse auto, clear
                  
                  lowess mpg weight , gen(smooth) name(G1)
                  
                  twoway lowess mpg weight || line smooth weight, sort name(G2)
                  If you run this, you'll see two identical curves superimposed.


                  EDIT Crossed with #8. My feelings about ChatGPT are that most users get what they deserve.


                  George Ford Sorry, but what has kdensity got to do with this? Kernel density estimation is univariate; lowess is bivariate. kdensity allows different kernels; lowess as implememted in Stata doesn't.
                  Last edited by Nick Cox; 14 Aug 2023, 11:28.

                  Comment


                  • #10
                    Sorry, I was using kdensity, but the logic is the same as Nick demonstrates.

                    Comment


                    • #11
                      Thanks. However, still have some questions to clarify

                      1. What does -twoway- actually mean ie what is its definition ? I didn’t find the help file in stata helpful

                      2. if two way and lowess produce the same curves , what makes them different ? Are they the same and if so, why do both options exist then ?

                      3. with regards to lpoly , does it also explore linear / non linear relationships

                      4. with regards to my post #1 is there a way how I can find the value of smoothing parameter ?
                      Or is this equivalent to the bandwidth ?

                      5. are there any resources to further understand loess aparat from cleveland et al’s papers. I was looking for something a bit less mathematical but clear and informative.
                      Last edited by Denise Vella; 14 Aug 2023, 12:58.

                      Comment


                      • #12
                        1) twoway is the name of a suite of plotting commands. Nothing more. Nothing more can be said for what you find unhelpful but don’t elaborate.

                        2) see my earlier comment. Two commands exist for different purposes. -twoway- is for plotting and -lowess- on its own is to produce the smoothed values for programming. Twoway uses this command under the hood.

                        Others may comment on the rest.

                        Comment


                        • #13
                          1. What does -twoway- actually mean ie what is its definition ? I didn’t find the help file in stata helpful
                          That's unfortunate because other than the code the help file is the definition of the command. But two good points to start are that twoway is designed to produce various kind of graphs and in particular twoway commands can be combined with other twoway commands, as for example scatter and line can be combined.

                          2. if two way and lowess produce the same curves , what makes them different ? Are they the same and if so, why do both options exist then ?
                          lowess is a statistical command. In a strong sense its production of a graph is a side-effect. One way to see that: lowess allows a generate() option, as already used in this thread.

                          3. with regards to lpoly , does it also explore linear / non linear relationships
                          As a smoothing command, lpoly just tries to give a data-driven smooth. If the overall relation between two variables is approximately linear, so too will the smooth be. lpoly is limited to the outcome being single-valued given the predictor, so if the data points are arranged in a circle, lpoly won't echo that.

                          4. with regards to my post #1 is there a way how I can find the value of smoothing parameter ?
                          Or is this equivalent to the bandwidth ?
                          You're in charge. If you don't like any default you need to choose your own degree and kind of smoothing. That doesn't rule out your writing code to find your own smoothing parameters as optimal in some way. Yes. the kernel width is loosely analogous to lowess bandwidth, but the first is in the units of the x variable and the second is fraction of the x variable range.

                          Personal detail: I find the lpoly defaults not smooth enough. I've heard scuttlebut that they were deliberately mediocre choices to ensure that users didn't adopt them. That was perhaps a joke, although a fairly good one. I wrote my own localp on SSC with different defaults. Otherwise it's a wrapper for lpoly.

                          5. are there any resources to further understand loess aparat from cleveland et al’s papers. I was looking for something a bit less mathematical but clear and informative.

                          There is no egg-less omelette. I've found Cleveland's books to be very clear. Among many overlapping smoothing books, that by Hastie and Tibshirani on generalized additive models is one I keep going back to. (My formal mathematics education stopped at age 17.)
                          Last edited by Nick Cox; 14 Aug 2023, 14:18.

                          Comment


                          • #14
                            Nick Cox and I differ here - in my opinion, -lpoly- gives too much emphasis to what I will, loosely, call "noise", here meaning idiosyncrasies in the data set that are not generalizable (to some extent this can be ameliorated by playing with the bandwidth but, again in my opinion, this itself is a problem; in addition, in the field I currently work in most often (medicine), there is a lot of literature on using -lowess- for calibration (both internal and external) but I know of none for -lpoly-; also, there is at least some agreement that the bandwidth for -lowess- should be .75; I know of no such agreement re: -lpoly-

                            Comment


                            • #15
                              twoway allows you to have multiple graphs of different or the same type. it uses the same methods as a single approach, just overlays multiple approaches.

                              the bandwidth is reported with lowess.

                              Comment

                              Working...
                              X