Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using margins with restricted cubic splines

    Our research group would like to use restricted cubic splines in the twopm regression command (among other places).
    We can figure out from the available doc and web resources how to specify and execute the models we are interested in.

    But we aren't sure how to correctly specify the margins command afterward, unless it is correct to assume we can
    just exactly copy the syntax from linear splines (which seems too easy to be correct).

    We'd like to be directed to a URL with an example where the margins option is used with the covariate that is being
    expressed as a restricted cubic spline. To be more concrete, we would like to compare treatment groups at different ages,
    where age (a continuous independent variables) is being expressed as a restricted cubic spline.


    That is, we'd like to construct the command for "margins, dydx(treatment) at(age=35)" when age might be
    made/expressed from a command like "mkspline agesp = age, cubic knots(20 30 40 50 60)".

    Can someone point us to a useful resource?



  • #2
    ssc desc postrcspline
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thank you, Maarten, and Happy New Year.

      Comment


      • #4
        Maarten, thanks for pointing us to postrcspline, and also for the work that went into creating the package and updating it.

        We have given the materials a good look, including going through the help files and the 2009 presentation. But we still are not sure how to answer the specific question with which we started.

        That probably says we didn’t specify the question carefully enough, or it could be that we haven’t actually dug deeply enough into the materials you suggested. If the latter, feel free to point us more particularly to what we missed (and we apologize for misusing your time!). If the former, any help you can offer will be appreciated.

        Background to our problem.
        We are using a two part model as outlined in Belotti et al (2015) STATA Journal 15, pp 3-20. We note that our model does require the use of an offset.

        Our response variable is cost of medical services, call it “cost.” Our independent variables include: body mass index (BMI, quantitative), age (quantitative), race (factor/categorical), and marital status (factor/categorical).

        Our primary analytic focus is on a final independent variable, the levels of treatment completed by subjects (including a relevant control group, to which we make our primary comparison, though we also need to see other comparisons); this is captured in a factor/categorical variable, which we’ll call “group.” We want to estimate how the levels of treatment affect cost, controlling for other things.

        We desire to use restricted cubic splines for the independent variables BMI and age so that the model might be specified, for example, with

        mkspline2 agespline = age, cubic nknots(3)
        mkspline2 BMIspline = BMI, cubic nknots(3)

        We believe a "main efffects" model invocation would be as follows (if you are familiar with twopm and see any corrections please suggest them):

        twopm cost ib1.group ib1.marital ib2.race BMIspline* agespline* , ///
        firstpart(logit, offset(offsetvariable)) secondpart(glm, family(gamma) link(log) offset(offsetvariable))

        Here (finally) is our specific problem:
        We would like to estimate marginal effects for the categories of group at specific values of age and BMI. Our question is what is the right syntax for this?

        As far as we can tell this question is not treated in any of the material you suggested, and we can only find a limited example for the case of linear splines.

        Ideally we would like to use commands similar to those provided below.

        margins , dydx(group) at(age=40 BMI=30)
        margins group, at(age=40 BMI=30)

        We expect that this is not the correct syntax, but we don't know how to specify the spline-covariate values for the BMI and age variables in the margins command. Can you assist us?

        As noted above, we have not been able to find relevant documentation examples beyond a basic example with linear splines. Things we don’t know include: does each spline component (e.g. BMI1, BMI2, BMI3, etc.) need to specified? At what values? Please feel free to specify hypothetical values for the knots for us if it is necessary to understand how to specify the variables.

        We will also wish to interact our group variable with the spline versions of age and BMI. If you can assist, we would appreciate your advice about how to specify an interaction model and obtain margins estimates from the result.

        Here is our best current guess. Does the following seem reasonable?

        twopm cost ib1.group##c.BMIspline* ib1.group##c.agespline* ib1.marital ib2.race , ///
        firstpart(logit, offset(offsetvariable)) secondpart(glm, family(gamma) link(log) offset(offsetvariable))

        If this seems ok, our question again is: how can we correctly specify the margins command?

        If you are by chance interested in more about our topic, we are working on a companion to the 2016 paper listed on our
        web site at: http://www.morris.umn.edu/academics/...oject/results/

        Thanks.

        Comment


        • #5
          The postrcspline package was written before twopm, so there is no out of the box support for twopm. Since there are two parts involved it is not that straightforward to incorporate support for twopm in postrcspline. Getting the marginal effect with margins will be hard as there is no way of telling margins that the different variables that make up the spline belong together.

          What is possible is to use margins and marginsplot to get graph adjusted predictions, which is usually a more informative graph anyhow. I have only considered using such a graph of marginal effects when it was really my primary variable of interest, and even than I finally ended up annotating a graph of the adjusted predictions rather than displaying the graph of marginal effects. Since in your paper the variables are only control variables, displaying the adjusted predictions will be more than enough.

          Below is an example, though in that case the cubic spline is probably overkill.

          Code:
          // open and prepare example data
          webuse womenwk, clear
          
          replace wage = 0 if wage==.
          
          label define educ 10 "< high school" ///
                            12 "high school"   ///
                            16 "bachelor"      ///
                            20 "> bachelor"
          label value educ educ
          
          // make a restricted cubic spline
          mkspline ages = age, cubic nknots(3)
           
          // estimate the model
          twopm wage i.educ ages* married children,     ///
              firstpart(probit)                         ///
              secondpart(glm, family(gamma) link(log))
          
          // predictions, not marginal effect
          margins, at(educ=12 married=1 children=1) over(age)
          
          // plot those predictions
          marginsplot, recastci(rarea) ciopts(pstyle(ci)) ///
                       plotopts(msymbol(i))
          Click image for larger version

Name:	Graph.png
Views:	1
Size:	13.5 KB
ID:	1369444
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment

          Working...
          X