Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Setting knots in the linear spline models

    Hello!

    I am trying to replicate a model proposed in the paper by Gao, G. G., Greenwood, B. N., Agarwal, R., & McCullough, J. S. (2015). VOCAL minority and silent majority: How do online ratings reflect population perceptions of quality. MIS Quarterly: Management Information Systems, 39(3), 565-589. Specifically, the Equation (3) on p.577:

    Online rating = a + f(s; beta) + X + M + e

    where f(s; beta) denote splines of physician quality. "Knots in the linear spline models are set at the 25th and 75th percentiles of physician quality." (Gao et al., 2015) In the results they report spline-lower end, spline-middle half, and spline-upper end of physician quality.

    In my case, physician quality is denoted x and here is how I create spline variables:
    Code:
    sum x
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
               x |        743    74.08614    6.772038         51         94
    
     mkspline qual 4 = x, pctile displayknots
    
                 |     knot1      knot2      knot3
    -------------+---------------------------------
               x |        69         75         79
    
    sum qual1 qual2 qual3 qual4
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           qual1 |        743    67.99327    2.633233         51         69
           qual2 |        743     3.84926    2.573218          0          6
           qual3 |        743    1.476447    1.765703          0          4
           qual4 |        743    .7671602    1.833618          0         15
    I have a couple of concerns about my approach. Firstly, I have 4 segments vs. 3 in case of Gao at al. How can I set the knots for 25th and 75th percentile only? And secondly, given that x's minimum value is 51, I am not quite sure why I observe zeros in the created spline variables. I would appreciate your feedback on the correct approach to execute this replication.

  • #2
    I think I was able to find an answer to my first question -- i.e., setting two knots (at the values associated with 25th and 75th percentiles) and creating corresponding three variables:

    Code:
    mkspline qual1 69 qual2 79 qual3 = x, displayknots
    
                 |     knot1      knot2
    -------------+----------------------
    x            |        69         79
    I am now looking at the descriptive statistics of the created variables (below). Qual1 seems to correctly reflect the first interval; but shouldn't the second and third intervals (qual2-3) be 70/79 and 80/94 given the range of x between 51 and 94? These zeros confuse me. I would appreciate your comments on this issue.
    Code:
     sum qual1 qual2 qual3
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           qual1 |        743    67.99327    2.633233         51         69
           qual2 |        743    5.325707    4.012309          0         10
           qual3 |        743    .7671602    1.833618          0         15
    Last edited by Anton Ivanov; 03 Jul 2018, 10:16.

    Comment


    • #3
      Just a quick update and advise for those facing similar issues in understanding splines:

      Everything has been clarified with the help of Chapter 2.4 of the following book -- Harrell, F. E., Jr. 2001. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer.

      Comment

      Working...
      X