Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best Smoothing Method for Visualizing Long-Term Trend in Mortality Rates

    Hello,

    I have a dataset containing monthly aggregated mortality rates for a specific (not very dangerous) procedure over a 10-year period. My goal is to create a plot that effectively illustrates the general trend over time, without displaying the individual data points, as the final visualization will be used for a lecture (but I have included the scatter plot for clarity here).

    I have considered using locally weighted regression (LOWESS) or local polynomial smoothing (lpoly), but I’ve noticed some irregularities at the tails when using lpoly. Given my objective, would either of these methods be appropriate, or is there a better approach for producing a smooth, reliable trend line?

    I appreciate any insights or alternative recommendations.

    Thank you!
    Click image for larger version

Name:	image_36660.png
Views:	1
Size:	480.8 KB
ID:	1771274

    Click image for larger version

Name:	Screenshot 2025-01-22 at 22.32.37.png
Views:	1
Size:	476.3 KB
ID:	1771275




    Code:
    clear
    input int admitmonth float monthly_mortality
    622         0
    623         0
    624         0
    625         0
    626         0
    627         0
    628         0
    629         0
    630         0
    631         0
    632 .16666667
    633  .2857143
    634         0
    635         0
    636         0
    637         0
    638         0
    639         0
    640         0
    641         0
    642         0
    643         0
    644         0
    645         0
    646         0
    647         0
    648         0
    649         0
    650         0
    651         0
    652         0
    653         0
    654      .125
    655         0
    656         0
    657         0
    658         0
    659         0
    660         0
    661         0
    662         0
    663         0
    664         0
    665         0
    666         0
    667         0
    668         0
    669 .14285715
    670         0
    671         0
    672         0
    673         0
    674         0
    675         0
    676         0
    677         0
    678         0
    679         0
    680         0
    681         0
    682 .16666667
    683         0
    684         0
    685         0
    686         0
    687         0
    688         0
    689         0
    690         0
    691         0
    692         0
    693         0
    694      .125
    695         0
    696         0
    697         0
    698         0
    699         0
    700       .25
    701         0
    702         0
    703         0
    704         0
    705         0
    706         0
    707         0
    708         0
    709         0
    710         0
    711         0
    712 .16666667
    713 .14285715
    715         0
    716         0
    717         0
    718         0
    719         0
    720         0
    721         0
    722         0
    723         0
    724         0
    725         0
    726         0
    727         0
    728         0
    729         0
    730         0
    731         0
    732         0
    733         0
    734         0
    735         0
    736         0
    737         0
    738       .25
    739         0
    740        .2
    741         0
    742         0
    743         0
    end
    format %tm admitmonth
    
    twoway lpoly monthly_mortality admitmonth, color(maroon%10) degree(4) || ///
           scatter monthly_mortality admitmonth ///
           , ///
           scheme(uncluttered) ///
           xtitle("Year") ytitle("In-hospital mortality [%]", margin(vsmall)) ///
           tlabel(624(12)744, format("%tmCY")) ///
           xlabel(, labsize(small)) ///
           ylabel(0(0.05)0.3, labsize(small)) ///
           legend(off) ///
           graphregion(margin(l-1 r+3)) ///
           title("Monthly mortality - lpoly") ///
           name(lpoly, replace)
          
          
    twoway lowess monthly_mortality admitmonth, color(maroon%10) || ///
           scatter monthly_mortality admitmonth ///
           , ///
           scheme(uncluttered) ///
           xtitle("Year") ytitle("In-hospital mortality [%]", margin(vsmall)) ///
           tlabel(624(12)744, format("%tmCY")) ///
           xlabel(, labsize(small)) ///
           ylabel(0(0.05)0.3, labsize(small)) ///
           legend(off) ///
           graphregion(margin(l-1 r+3)) ///
           xtitle("Year", margin(medsmall) size(medium)) ///
           ylabel(0(0.05)0.3, labsize(small)) ///
           ytitle("In-hospital mortality [%]", margin(medsmall) size(medium)) ///
           title("Monthly mortality - lowess") ///
           name(lowess, replace)
    Last edited by Tim Wallner; Yesterday, 15:45.

  • #2
    I've had good results with lpoly but I would never try to fit quartics and I've found the default choices of kernel type and especially bandwidth often unhelpful. That is why I wrote localp at SSC, which has some default choices I like, which I depart from if the result is too smooth or too rough (or, sometimes, I give up).

    lowess is not flexible enough for my purposes. A very specific problem is that lowess or loess is far from standardized and over statistical software the results are not guaranteed identical.

    In principle I like splines. In practice -- but that is a story too long for me at this time of night.

    Not at all the answer you probably seek, but in your case (1) I would not suppress the data! (2) I would plot annual averages. The data don't seem particularly suited to any kind of smoothing.

    Here are some small ideas on visualization. A monthly date display format sometimes just messes up something else I am doing, which is a known quirk (bug) at StataCorp.

    Code:
    clear
    input int admitmonth float monthly_mortality
    622         0
    623         0
    624         0
    625         0
    626         0
    627         0
    628         0
    629         0
    630         0
    631         0
    632 .16666667
    633  .2857143
    634         0
    635         0
    636         0
    637         0
    638         0
    639         0
    640         0
    641         0
    642         0
    643         0
    644         0
    645         0
    646         0
    647         0
    648         0
    649         0
    650         0
    651         0
    652         0
    653         0
    654      .125
    655         0
    656         0
    657         0
    658         0
    659         0
    660         0
    661         0
    662         0
    663         0
    664         0
    665         0
    666         0
    667         0
    668         0
    669 .14285715
    670         0
    671         0
    672         0
    673         0
    674         0
    675         0
    676         0
    677         0
    678         0
    679         0
    680         0
    681         0
    682 .16666667
    683         0
    684         0
    685         0
    686         0
    687         0
    688         0
    689         0
    690         0
    691         0
    692         0
    693         0
    694      .125
    695         0
    696         0
    697         0
    698         0
    699         0
    700       .25
    701         0
    702         0
    703         0
    704         0
    705         0
    706         0
    707         0
    708         0
    709         0
    710         0
    711         0
    712 .16666667
    713 .14285715
    715         0
    716         0
    717         0
    718         0
    719         0
    720         0
    721         0
    722         0
    723         0
    724         0
    725         0
    726         0
    727         0
    728         0
    729         0
    730         0
    731         0
    732         0
    733         0
    734         0
    735         0
    736         0
    737         0
    738       .25
    739         0
    740        .2
    741         0
    742         0
    743         0
    end
    
    * format %tm admitmonth
    
    gen year = year(dofm(admitmonth))
    egen annave = mean(monthly_mortality), by(year)
    
    forval y = 2012/2021 {
        local end = ym(`y', 12) + 0.5  
        local ends `ends' `end'
        local middle = ym(`y', 6) + 0.5 
        local call `call' `middle' "`y'"
    }
    
    twoway spike monthly admitmonth || line annave admitmonth, c(J) ///
    ytitle(Monthly mortality (%)) legend(order(2 "Annual average") pos(12)) ///
    xtick(`ends', tlength(*4)) xla(`call', tlength(*0.5) tlc(none)) xtitle("")
    The idea is to suppress the zeros and to make more emphatic when you have positive values. But the onus is then on you to explain that the averages are based on the zero too. See https://journals.sagepub.com/doi/pdf...867X0800700410 for the trickery with time axis ticks and labels.


    Click image for larger version

Name:	monthly.png
Views:	1
Size:	45.2 KB
ID:	1771287

    Comment


    • #3
      On second thoughts, I guess most people would prefer that the zeros were shown as such. Here's my suggestion now:

      Code:
      twoway spike monthly admitmonth || ///
      scatter monthly admitmonth if monthly == 0, ms(oh) mc(stc1) || ///
      line annave admitmonth, c(J) lc(stc2) ///
      ytitle(Monthly mortality (%)) legend(order(3 "Annual average") pos(12)) ///
      xtick(`ends', tlength(*4)) xla(`call', tlength(*0.5) tlc(none)) xtitle("")
      Click image for larger version

Name:	monthly2.png
Views:	1
Size:	57.1 KB
ID:	1771314

      Comment

      Working...
      X