Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scatter plot: individual slope versus baseline GFR

    Dear Stata users,


    I am trying to create a scatter plot of individual slopes (GFR) versus baseline kidney function (GFR) using repeated measures data (but not sure how can I do this in Stata).

    Data looks as below.

    PHP Code:

    [CODE]
    Example generated by -dataex-. To installssc install dataex
    clear
    input long ID float
    (Visit_months Years_from_baselinedouble gfr int sbp
    1   0         0  53.7 127
    1   4  .3150685  43.8 117
    1  12  .9561644  50.4 125
    1  24 2.0027397  46.7 118
    1  36  3.016438  38.1 115
    1  48  3.978082  46.4 114
    1  60  4.994521    37 102
    1  72  5.991781    39 110
    1  84  7.008219  34.1 110
    1 100  7.282192  35.6 108
    1 101 10.432877  30.5 119
    1 103 13.238357  38.2 119
    2   0         0 121.7 114
    2   4  .3068493  87.2 114
    2  12 1.0109589  74.6 112
    3   0         0  79.6 146
    3   4  .3150685  71.2 122
    3  12 1.0109589  71.1 134
    3  24 2.0219178  68.7 141
    3  36  3.030137  46.5 144
    3  48 4.0054793  43.5 120
    4   0         0    54 159
    4   4  .3260274  58.8 129
    4  12  .9945205  58.7 113
    4  24 2.0054793  49.6 119
    4  36  3.005479  53.1 134
    4  48  4.076712  53.8 117
    4  60  5.008219    56 111
    4  72   6.00274  57.3 118
    4 100   6.19726  65.9 116
    4 101  9.052054  56.3 116
    end
    [/CODE

    I would really appreciate if anyone may help me.

    Many thanks.
    Oyun

  • #2
    It isn't clear whether you want to use Visit_months or Years_from_baseline as your time variable. They are highly correlated. If you want to use Visit_months you can do this:

    Code:
    xtset ID Visit_months
    xtline gfr
    You can't do that with Years_from_baseline because, unlike Visit_months, Years_from_baseline takes on non-integer values. But because it is perhaps more precise, it may be more desirable to use it. In that case the code is just a bit different:

    Code:
    xtset ID
    xtline gfr Years_from_baseline
    Note: This will put each individual's plot into a separate panel. If you want them all in one panel, with a legend showing which one is whose, you can add the -overlay- option to the -xtline- command.

    Note: This only looks good with a small number of individuals--after that it becomes unreadable. Of course, if you have a large number of IDs, then the separate panels also become too small to see anything. The whole project of graphing individual trajectories falls apart with a large number of IDs either way.

    Comment


    • #3
      Thank you so much for your reply professor Schechter.

      The scatter plot I would like to create looks as below (I want to plot individual mean eGFR change (slope) against baseline eGFR).

      Click image for larger version

Name:	Mean GFR slope against baseline GFR.png
Views:	1
Size:	42.0 KB
ID:	1477257



      Another example:

      Click image for larger version

Name:	Scatter plot_gfr.png
Views:	1
Size:	313.8 KB
ID:	1477258


      Thank you.
      Oyun
      Last edited by Buyadaa Oyunchimeg; 05 Jan 2019, 00:03.

      Comment


      • #4
        Further clarification is needed, as neither graph is completely labeled/legended/explained.

        First, what do you mean by individual mean GFR change. Do you mean calculating each individual's change from baseline GFR at each observation in time and taking the average? Or do you mean calculating each individual change from the most recent observation of GFR and averaging those? Or do you want to fit a regression line through each person's observations and use the slope of that line as the "average?" Or something else?

        And are we talking about a single average for each patient (i.e. each point in the scatterplot is one person) or are these perhaps running averages (and each point in the scatterplot is a person at a given moment in time, with each person having several such points)?

        And what is the line/curve in the graph?

        Comment


        • #5
          Thank you so much for your reply professor Schechter.


          1. It could be "mean calculating each individual's change from baseline GFR at each observation in time and taking the average" or "fitting a regression line through each person's observations and use the slope of that line as the average" (not sure which one gives more accurate result). What would you suggest?

          2. Each point in scatter plot is one person (a single average for each person).

          3. The curve in the graph is lowess line.




          Oyun

          Comment


          • #6
            Let's go with the regression slope. You can't say that either type of "average" is more "accurate" than they other: they are different things. I'm choosing the regression slope because I think it's easier for most people to understand.

            Code:
            //    REGRESS EACH INDIVIDUAL'S GFR AGAINST TIME
            sort ID Years_from_baseline
            rangestat (reg) gfr Years_from_baseline (first) gfr, by(ID) interval(Visit_months . .)
            
            //    GRAPH THE SLOPE VS THE BASELINE
            label var b_Years_from_baseline "Rate of GFR Change"
            label var gfr_first "Baseline GFR"
            egen flag = tag(ID)    // MARK ONE OBSERVATION PER ID
            lowess b_Years_from_baseline gfr_first if flag
            -rangestat- is by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.

            In the -rangestat- command, the interval() option is required, although in this case we really have no restriction on which observations are included for any ID. So we just pick a numeric variable that is not otherwise in use, and give it a range from . to . (which means include any non-missing value).

            Comment


            • #7
              Thank you so much for answer professor Schechter.


              I have tried the codes and got the following results.

              a*. Albuminuria (UACR<30mg/g)


              Click image for larger version

Name:	Overall_GFR_nonalb.png
Views:	1
Size:	311.2 KB
ID:	1477414


              b*. Albuminuria (UACR>30 mg/g)

              Click image for larger version

Name:	Overall_GFR_alb.png
Views:	1
Size:	296.2 KB
ID:	1477415


              *Hypothesis of this study is that those with increased albuminuria at baseline will have higher rate of decline than those with normal albuminuria. Thus, participants were stratified by baseline albuminuria status (to look at whether GFR decline differ by baseline albuminuria status). In this case can we prove the above speculation? And also is there anyway to improve the plots?

              What would you suggest?

              Thank you so much again.
              Oyun

              Comment


              • #8
                It is difficult to see what is going on in those two graphs. The vertical axes are on different scales, so a direct visual comparison of the slopes of the lowess lines (which appear to be equal and nearly zero in the graphs) is potentially quite misleading. To do this graphically, you should store each graph in memory (see the -name()- option in -graph-, which also works with -lowess-, and then -graph combine- with the -ycommon- option. Then you will have two similarly scaled graphs side by side and it will be a bit clearer what is going on.

                In any case, I would supplement the graphs with an analytic approach. I would probably do this as a mixed model where the gfr:time slope is interacted with albuminuria, and include random slopes for gfr:time at the ID level. So something like this:

                Code:
                mixed gfr i.albuminuria##c.Years_from_baseline || ID: Years_from_baseline, cov(unstructured)
                The covariance between the random intercept and random slope will give you a quantified estimate of the extent to which the rate of decline in gfr is associated with the baseline gfr. And the interaction between albuminuria and Years_from_baseline will tell you whether this works differently in those with and without albuminuria.

                Comment


                • #9
                  Thank you so much professor Schechter.

                  Combined graph looks as below.

                  Click image for larger version

Name:	combined graph.png
Views:	1
Size:	353.4 KB
ID:	1477508

                  What would you suggest to do with outlier in graph 1?

                  As suggested I will supplement graph with mixed model analysis.

                  Thank you so much again.
                  Oyun

                  Comment


                  • #10
                    Originally posted by Buyadaa Oyunchimeg View Post
                    What would you suggest to do with outlier in graph 1?
                    I recommend that you throw both graphs out, and substitute graphs of predictions (help marginsplot) from the model that Clyde suggests, or one similar to it, perhaps either with splines for follow-up interval or with a uniform interval imposed on all patients' follow-up.

                    First, in your smattering of a data listing, you have more than a 13-fold difference in follow-up interval since baseline. To use Rate of GFR Change as a single data point assumes that the rate of change is constant in time. Is it?

                    Second, if there is substantial albuminuria at baseline, then loss of renal functional integrity has already begun by baseline at least in those patients with the more severe renal insult. Again, unless the rate of deterioration is constant, then that pre-baseline loss might not be adequately reflected.

                    Last, there are a lot of pitfalls in scatterplots of change scores versus baseline. Google harrell change-score for further information.

                    Comment


                    • #11
                      Thank you so much for your suggestions Joseph.

                      I have tried the -mixed model and got the following results.

                      PHP Code:
                       Codes:

                      xtmixed gfr i.alb_gr##c.Years_from_baseline##Visit_months || ID:

                      marginsover(Visit_months alb_gr)

                      marginsplot 
                      Click image for larger version

Name:	marginplots.png
Views:	1
Size:	387.8 KB
ID:	1477542


                      I am wondering are the above codes correct?

                      Thank you so much again.
                      Oyun
                      Last edited by Buyadaa Oyunchimeg; 07 Jan 2019, 01:49.

                      Comment


                      • #12
                        Including the interaction between Years_from_baseline and Visit_months is rather strange. For one thing, as coded, it treats Visit_months as a discrete variable. That might be reasonable in some circumstances, but when interacted with Years_from_baseline it becomes downright bizarre because the two variables are actually almost the same thing: VIsit_months is just Years_from_baseline rescaled and truncated or rounded to an integer. So I really cannot wrap my mind about what the interaction between Years_from_baseline (continuous) and Visit_months (discrete) might mean here. I would be more inclined to do something like this:

                        Code:
                        xtmixed gfr i.alb_gr##c.Visit_months || ID:
                        margins alb_gr, at(Visit_months = (0(6)108))
                        marginsplot
                        By the way, if you are using a current version of Stata, -xtmixed- is now called -mixed-. The name -xtmixed- is still accepted, but at some point in the future it may go out of use. So best to get in the habit of using the newer terminology.

                        Comment


                        • #13
                          Thank you so much for your help professor Schechter.

                          May I ask just one more question.

                          Is it possible to plot change (slope) in GFR over time after -mixed model-.

                          ... graph similar to this one.


                          Click image for larger version

Name:	another graph.png
Views:	1
Size:	24.8 KB
ID:	1477694

                          Comment


                          • #14
                            If you want to plot, say, the mean change in GFR at each time point (Visit_months) by albuminuria group:

                            Code:
                            by ID (Years_from_baseline), sort: gen gfr_change = gfr - gfr[1]
                            collapse (mean) gfr_change, by(alb_gr Visit_month)
                            reshape wide gfr_change, i(Visit_month) j(alb_gr)
                            graph twoway connect gfr_change* Visit_month
                            will do that. It has nothing to do with the mixed model. This is the observed data.

                            Now, you can also do a mixed model using gfr_change, rather than gfr as the outcome variable, and follow that with -margins- and -marginsplot- commands to get a similar graph of the modeled gfr_change in each group as a function of time.

                            Comment


                            • #15
                              Thank you so much for your help professor Schechter.

                              Comment

                              Working...
                              X