Dear Statalist community,
I am using StataSE 16 (64-bit) on Windows 10. My question is regarding producing clustered standard errors when using nl hockey, which is a user-written command by Dr Mark Lunt for estimating the breakpoint between two intersecting straight lines (referred to as a piecewise [or "hockey stick"] regression). Please see the ado file:
There is an UCLA tutorial on piecewise regressions in Stata available here:
The dataset contains 4 variables:
However, there are 2 repeated observations per child, so it would be appropriate to adjust standard errors for clustering on their id. However, on running nl hockey with clustering on the id variable, I retrieve the following error:
I am wondering how I may be able to produce clustered standard errors for results retrieved when using nl hockey. I'll explain what I have tried / looked into before opening the floor to suggestions.
I looked into writing code based on the programmer’s command _robust. Please see the manual:
However, it requires variables of the estimated parameters - for which I am not sure what should be contained in them. I wondered whether the model equation would need to be solved for each of the parameters, and the estimates for each parameter entered into each of the corresponding variables. The code is as follows:
I also looked into re-writing the program as a substitutable expression or function evaluator program, but this hasn't been successful so far and I would welcome any advice on how to do this from the ado file for nl hockey posted above. Please see the nl manual:
I look forward to responses.
Best wishes,
Simon Baldwin
I am using StataSE 16 (64-bit) on Windows 10. My question is regarding producing clustered standard errors when using nl hockey, which is a user-written command by Dr Mark Lunt for estimating the breakpoint between two intersecting straight lines (referred to as a piecewise [or "hockey stick"] regression). Please see the ado file:
HTML Code:
https://personalpages.manchester.ac.uk/staff/mark.lunt/nlhockey.ado
and the help file:
HTML Code:
https://personalpages.manchester.ac.uk/staff/mark.lunt/nlhockey.hlp
HTML Code:
https://stats.oarc.ucla.edu/stata/faq/how-can-i-run-a-piecewise-regression-in-stata/
from which I'll be using a version of the talk dataset that I have adapted to this question. Please use the following code to reproduce this dataset:
Code:
. use https://stats.idre.ucla.edu/stat/stata/faq/talk, clear . expand 2 . sort id . by id: gen obs = _n . replace talk = talk + .8474287 if obs == 2 & age <= 11.35291 . replace talk = talk + 3.974196 if obs == 2 & age > 11.35291 . replace age = age + 1 if obs == 2
- id: an anonymous identifier of a child in the dataset
- talk: how much the child talks on the phone
- age: the age of the child
- obs: the repeated observation number
- breakpoint: the breakpoint between two intersecting straight lines modelling the change in talk per year of age
- slope_l: the gradient of the straight line to the left of the breakpoint
- slope_r: the gradient of the straight line to the right of the breakpoint
- cons: the constant (at age = 0)
Code:
. nl hockey talk age (obs = 400) Iteration 0: residual SS = 1724947 Iteration 1: residual SS = 48522.02 Iteration 2: residual SS = 31301.01 Iteration 3: residual SS = 28998.47 Iteration 4: residual SS = 28917.09 Iteration 5: residual SS = 28859.84 Iteration 6: residual SS = 28763.72 Iteration 7: residual SS = 28763.47 Source | SS df MS Number of obs = 400 -------------+------------------------------ F( 3, 396) = 444.51 Model | 96860.6041 3 32286.868 Prob > F = 0.0000 Residual | 28763.4668 396 72.6350173 R-squared = 0.7710 -------------+------------------------------ Adj R-squared = 0.7693 Total | 125624.071 399 314.847296 Root MSE = 8.522618 Res. dev. = 2845.31 (hockey) ------------------------------------------------------------------------------ talk | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- breakpoint | 12.79033 .6941923 18.42 0.000 11.42557 14.15509 slope_l | 1.107216 .4397491 2.52 0.012 .2426814 1.971751 slope_r | 3.999005 .1564209 25.57 0.000 3.691486 4.306525 cons | 4.656507 4.259092 1.09 0.275 -3.716751 13.02976 ------------------------------------------------------------------------------ * Parameter cons taken as constant term in model & ANOVA table (SEs, P values, CIs, and correlations are asymptotic approximations)
Code:
. nl hockey talk age, cluster(id) cluster() not allowed r(198);
I looked into writing code based on the programmer’s command _robust. Please see the manual:
HTML Code:
https://www.stata.com/manuals/p_robust.pdf
Code:
. marksample touse . markout `touse' id, strok . nl hockey talk age if `touse' . matrix D = e(V) . forvalues i = 1/4 { . forvalues j = 1/4 { . matrix D[`i',`j'] = ((e(V)[`i',`j'])*(sqrt(abs(e(V)[`i',`j']))/e(rmse))^2)/abs(e(V)[`i',`j']) . } . } . matrix b = e(b) . gen breakpoint = //??? . gen slope_l = //??? . gen slope_r = //??? . gen cons = //??? . local n = e(N) . local k = colsof(D) . predict double e if `touse', residual . sort `touse' id . by `touse' id: gen byte count = 1 if _n == 1 & `touse' . summarize count, meanonly . local n_id = r(sum) . local dof = `n_id' - 1 . local clopt "cluster(id)" . ereturn post b D, dof(`dof') esample(`touse') . _robust e if e(sample), minus(`k') `clopt' . ereturn display . drop breakpoint slope_l slope_r cons e count
HTML Code:
https://www.stata.com/manuals/rnl.pdf
Best wishes,
Simon Baldwin
Comment