Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi again Clyde Schechter. Thank you so much for the helpful code. By making BMI categorical (healthy weight vs overweight) I've been able to use margins saving. However, I have a follow-up question on the graph I've generated, shown here:

    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	9.8 KB
ID:	1467405


    I used the first spline variable as the horizontal axis variable (as suggested), but I am confused as to what exactly the z-statistic on the y-axis is representing. I know it's the statistic for testing the margin values against zero, but I'm unclear how this relates to the original dependent variable (testosterone levels) when I am attempting to interpret the interaction between age and BMI in relation to testosterone levels. This is probably a stupid question, but I've been unable to find the answer online. Thank you once again for all your assistance, hopefully once I make sure I am interpreting these graphs correctly I will be all set.

    Comment


    • #17
      I can't really think of a good, simple way of interpreting the z-statistic. Why did you choose to plot that? Why not just plot the predicted margins themselves? There won't be any mystery about what that means.

      Comment


      • #18
        OK, I was following the code and thought there might be a particular reason that the z-statistic was in the code, but that makes more sense. Thank you very much.

        Comment


        • #19
          Oh, I see what happened. Yes, I'm not sure why I specified _statistic instead of _margin in that sample code. I don't know what I was thinking. Sorry about that.

          Comment


          • #20
            Dear Clyde and others,

            I am interested in plotting predictions from a model that has a continuous covariate included in the model using restricted cubic splines. The model analyzes complex survey data using svy commands, so I was unable to use the postrcspline package.

            The below code has been a lifesaver, and I've been able to generate some very nice output and figures.

            My question is this: Are the 95% confidence intervals generated from this method valid? A collaborator with much more statistical training than me wonders, and I don't have a great answer. I am hoping Clyde and/or others may have insight.

            Thank you!


            Originally posted by Clyde Schechter View Post
            So this is a bit difficult. With just a linear age term you would pick an interesting set of values of age, and then use those in the -at()- option of -margins- and then run -marginsplot-. But you can't do that here because age isn't actually part of the model. So what you need to do is pick values of age that are interesting and are also instantiated in your data set. Then from the dataset you can find out the values of the Agesp* variables that correspond to those values of age. Then use those in -at()- options in your -margins- command. Note, by the way, that you do not want to have -margins- at crossed values of the different Agesp* variables: most of those combinations will not correspond to any value of age at all. So you do something like this (illustrated with the auto.dta)

            Code:
            sysuse auto, clear
            
            mkspline mpgsp = mpg, nknots(3) cubic
            
            regress price i.foreign##c.(mpgsp*)
            
            levelsof mpg
            
            foreach n of numlist 15 20 25 30 {
            local atspec`n'
            foreach v of varlist mpgsp* {
            summ `v' if mpg == `n'
            local atspec`n' `atspec`n'' `v' = `r(mean)'
            }
            }
            
            tempfile margins
            margins foreign, at(`atspec15') at(`atspec20') at(`atspec25') at(`atspec30') ///
            saving(`margins')
            use `margins', clear
            graph twoway line _statistic _at2, by(_m1)
            If you want to graph this, -marginsplot- will not be helpful here because it does not recognize that mpgsp* are related to each other and to mpg. So you would have to add a -saving()- option to -margins-. Then you can open that data set, identify the variable that contains the predicted values, and then use -graph twoway-. Your horizontal access variable can be the first spline variable, because the first variable in a cubic spline is always equal to the variable that it is derived from. So Agesp1 == age. The code above illustrates the approach. You may want to use a different approach to the graph, and the `margins' tempfile has everything in it you need: it's just a matter of crafting the -graph- commands to get the specific display you want.

            Comment


            • #21
              No they're not valid for survey data because naked -regress- is not valid for survey data. You need to -svyset- your data to reflect the survey design. Then replace -regress- with -svy: regress-, and everything will be fine.

              Comment


              • #22
                Thanks, Clyde. I apologize for my lack of clarity; I am using the svyset commands with the svy: prefix for all models.

                Just to make sure I am understanding correctly: The 95% CIs after margins will be valid if I specify multiple "at" levels in a svy model, using your technique outlined above?

                Thank you!

                Comment


                • #23
                  Yes, but let me just clarify something. They will be valid in the following sense: with repeated sampling of the population in accordance with the same survey design, 95% of the confidence intervals so arrived at will actually contain the population parameter value you are estimating.

                  If you are using mutliple -at()- conditions, then some people would say that you should correct for multiple comparisons. I don't do that, for a variety of reasons I will not get into here. But if you want to assert that in repeated samples of the population in accordance with the same survey design, 95% of those samples yield results in which all of the confidence intervals contain the corresponding estimated parameter, then you do need to do a correction.

                  Put, perhaps more simply, without "correction," what you have 95% confidence about is that each confidence interval individually contains the true value of the parameter. With correction, you will have 95% confidence in asserting that all of the confidence intervals contain the true value of the parameter.

                  If you want to do that "correction," it is similar to doing a Bonferroni p-value correction. So let's say you are using 5 at-levels. Instead of using level(95) [95% = 100-5%], you would use level(99) [99% = 100 - 5/5% = 100 - 1%]

                  As I say, I don't do that sort of thing in my own work, but some people feel strongly about doing it.

                  Comment


                  • #24
                    Clyde has this covered better than I could have. I would just like to point to the often-overlooked vce(unconditional) option of margins that should usually be used with/after svy estimation.

                    Comment


                    • #25
                      Thank you, Clyde. This makes sense to me.

                      I'm wondering what your assessment would be of the way I've used your sample code to generate predicted probability plots from a logistic regression for a continuous covariate (i.e., age) that uses restricted cubic splines. See attached image.

                      I generated predicted probabilities and their 95% confidence intervals for at()levels of age over the range of integers from 25 to 65 years.

                      I then plotted the result using a twoway rarea line plot. The line connects the margin from each at() level, and the rearea connects the upper and lower 95% confidence intervals.

                      (Also a note to Daniel Klein: Yes, I have used the vce(unconditional) option used with margins after svy estimation. Thank you for reminding me of this.)



                      Originally posted by Clyde Schechter View Post
                      Yes, but let me just clarify something. They will be valid in the following sense: with repeated sampling of the population in accordance with the same survey design, 95% of the confidence intervals so arrived at will actually contain the population parameter value you are estimating.

                      If you are using mutliple -at()- conditions, then some people would say that you should correct for multiple comparisons. I don't do that, for a variety of reasons I will not get into here. But if you want to assert that in repeated samples of the population in accordance with the same survey design, 95% of those samples yield results in which all of the confidence intervals contain the corresponding estimated parameter, then you do need to do a correction.

                      Put, perhaps more simply, without "correction," what you have 95% confidence about is that each confidence interval individually contains the true value of the parameter. With correction, you will have 95% confidence in asserting that all of the confidence intervals contain the true value of the parameter.

                      If you want to do that "correction," it is similar to doing a Bonferroni p-value correction. So let's say you are using 5 at-levels. Instead of using level(95) [95% = 100-5%], you would use level(99) [99% = 100 - 5/5% = 100 - 1%]

                      As I say, I don't do that sort of thing in my own work, but some people feel strongly about doing it.
                      Attached Files
                      Last edited by David Flood; 26 Oct 2020, 08:28.

                      Comment

                      Working...
                      X