Hi Statalisters,
I have a dataset that I would like to fit a multiple linear regression using spline regression with mkspline.
My issue relates to reproducibility, validation, and probably understanding the "under the hood" mechanics of piecewise regression: I can run a simple linear regression using regress and inclusion of the splined data. I can then replicate the coefficients by running separate simple linear regressions and including if conditions to delineate which spline should be included as an independent variable. When I replicate this method in a mulitple linear regression however, my coefficients (and hence slopes) are different depending on whether I run separate regressions or one regression with multiple splines.
I would preferably like to replicate the results from the combined spline regression model by conducting separate regression models as confirmation that I am conducting the spline regression correctly.
I have included test code using the auto dataset for explanatory purposes which is largely based on the UCLA FAQ page titled "How can I run a piecewise regression in Stata":
I have also included an excerpt of the data in question. The aim is to fit a spline mulitple regression on egfr_final using egfr_baseline, treatmentgroup, age category and presence of diabetes as covariates.
I have consulted the mkspline pdf documentation, Michael Mitchell's textbook on interpreting and visualizing regression models using stata, and a few other resources that have been recommended on Statalist.
Stata version: Stata version 16.1 IC
Appreciate the assistance.
I have a dataset that I would like to fit a multiple linear regression using spline regression with mkspline.
My issue relates to reproducibility, validation, and probably understanding the "under the hood" mechanics of piecewise regression: I can run a simple linear regression using regress and inclusion of the splined data. I can then replicate the coefficients by running separate simple linear regressions and including if conditions to delineate which spline should be included as an independent variable. When I replicate this method in a mulitple linear regression however, my coefficients (and hence slopes) are different depending on whether I run separate regressions or one regression with multiple splines.
I would preferably like to replicate the results from the combined spline regression model by conducting separate regression models as confirmation that I am conducting the spline regression correctly.
I have included test code using the auto dataset for explanatory purposes which is largely based on the UCLA FAQ page titled "How can I run a piecewise regression in Stata":
Code:
sysuse auto graph twoway /// (scatter price mpg) /// (lowess price mpg) regress price mpg, vce(robust) //Method 1: simple regression with mkspline nl hockey price mpg local knotvalue = 26.24305 mkspline mpg_knot1 `knotvalue' mpg_knot2 = mpg regress price mpg_knot1 mpg_knot2 //Method 2: simple regression with separate regressions - no centering regress price mpg if mpg < `knotvalue' regress price mpg if mpg >= `knotvalue' //Method 3: simple regression with separate regressions and centering capture drop mpg_knot* gen mpg_knot2 = mpg - `knotvalue' regress price mpg_knot2 if mpg < `knotvalue' regress price mpg_knot2 if mpg >= `knotvalue' *Method 4: Multiple regression with mkspline in combined model capture drop mpg_knot* mkspline mpg_knot1 `knotvalue' mpg_knot2 = mpg regress price weight mpg_knot1 mpg_knot2 *Method 5: Multiple regression with separate regressions regress price weight mpg if mpg < `knotvalue' regress price weight mpg if mpg >= `knotvalue'
I have also included an excerpt of the data in question. The aim is to fit a spline mulitple regression on egfr_final using egfr_baseline, treatmentgroup, age category and presence of diabetes as covariates.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(pid age_cat) byte(treatmentgroup diabetes) int(egfr_baseline egfr_final) 189 3 1 0 85 111 136 3 2 0 110 117 120 4 1 0 117 123 25 3 2 0 127 96 141 2 2 0 137 154 115 3 1 0 110 109 266 3 2 0 74 78 74 3 1 0 111 98 58 4 1 0 118 101 134 2 2 0 127 129 end label values age_cat labelagecat label values treatmentgroup labeltreatmentgroup label def labeltreatmentgroup 1 "AmphoB", modify label def labeltreatmentgroup 2 "Fluconazole", modify label values diabetes labelyesno
Stata version: Stata version 16.1 IC
Appreciate the assistance.
Comment