Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spaghetti plot with median line

    Hi everyone,

    I am trying to create a spaghetti plot for serum vitamin levels at 2 time points.
    So x axis would be time (days), y axis would be serum vitamin levels.

    For example, this is how my data is set out:
    ID Days SerumVitA
    1 357 1.99
    1 0 .76
    10 0 1.47
    10 248 2.39

    Additionally, I'd like to add in a median line.

    I tried using spagplot, however it seemed to connect some of the lines together where they shouldn't have been connected.

    Any help would be so much appreciated.

    Laura

  • #2
    I found spagplot on the UCLA website with this warning

    Spagplot makes the assumption that the first X variable for subject i+1 is less than the last X variable for subject "i" and this is not always true. When this is flase, the lines for two subjects will get connected.
    How many identifiers do you have? If your data example is typical, a median line would be hard to get as the measurements for different patients will be irregular and different. I suppose that quantile regression may help.

    Comment


    • #3
      Hi Nick,

      Thanks so much for your response.

      That would explain the joined lines.

      I have about 55 identifiers.

      Do you happen to know any other way I can make a spaghetti plot?

      Thanks so much again,

      Laura

      Comment


      • #4
        I have written, in a strong sense, on how to avoid spaghetti at all!

        2021 https://journals.sagepub.com/doi/ful...6867X211025838

        2019 https://www.stata-journal.com/articl...article=gr0080 (emerges from behind paywall in a few weeks' time)

        2010 https://journals.sagepub.com/doi/pdf...867X1101000408

        and many times on this list.

        A short answer is xtline, overlay

        With 55 identifiers, I guess that the identifiers aren't part of what you want to plot, and if you do, a legend with 55 entries will have a predictable side-effect of taking up much of the available space. In any case you will struggle to find 55 identifiably different line colours and/or line patterns.

        Although you have just two time points, the implication of your small data example is that the length of time elapsed varies (a lot).

        A median line could be just a line connecting median for first time point and median for second time point in which case you probably want to map your time variable to 0 and 1 or some other pair of constants.


        Otherwise smoothing by medians with irregularly spaced data is not, I think, trivial, and I would not want to suggest binning the time variable to get bin medians.

        I am not a medical statistician, or even a statistician, and certainly not a medic or medical scientist, but I would guess that your response variable is typically positive and highly skewed. That could be one reason you want medians. An alternative is to get smoothed geometric means with the recipe exp(smooth(log())).

        For my own fun. I played at that with the Grunfeld data. Clearly even if this does have relevance to your own data, you will need to vary many small details.

        Code:
        webuse grunfeld, clear 
        xtset company year
        
        xtline invest, overlay ysc(log) legend(off) yla(1000 300 100 30 10 3 1, ang(h))
        
        gen log_invest = ln(invest)
        
        * 3, 1 and biweight are all choices 
        lpoly log_invest year, bw(3) degree(1) kernel(biweight) gen(YEAR gmean)
        
        replace gmean = exp(gmean)
        
        * 10 is here the number of identifiers 
        * would this work with 55? never tested that. 
        forval j = 1/10 { 
            local opts `opts' plot`j'opts(lc(gs8) lw(thin))
        }
        
        xtline invest, overlay ysc(log) `opts' legend(order(11 "smoothed geometric mean")) yla(1000 300 100 30 10 3 1, ang(h)) addplot(line gmean YEAR, sort lw(medthick) lc(blue))
        Click image for larger version

Name:	geometricmean.png
Views:	1
Size:	50.3 KB
ID:	1692528

        Comment


        • #5
          Thank you so much Nick. Will have a read of the papers you have linked.

          I've largely managed to create what I what using xtline SerumVitA, overlay i(ID) t(Days) (although I haven't attempted to add the median line yet)
          I'm wondering if there is a way to have 2 different line colours by another variable (Gender)? As the lab reference ranges are different for each gender

          I have tried using separate and by () but haven't managed to make it work.

          Thanks again,
          Laura

          Comment


          • #6
            xtline is often a helpful convenience tool, but it can't support all the needed possibilities,

            Here is some technique (sensitive to sort order). Naturally you will separate, by(gender)

            Code:
            webuse grunfeld, clear 
            xtset company year 
            separate invest, by(mod(company, 2))
            label var invest1 "odd"
            label var invest0 "even"
            line invest1 year, c(L) || line invest0 year, c(L) ysc(log)

            Comment

            Working...
            X