Dear All
I suspect the answer to this question may be that I am doing something statistically erroneous rather than a problem with my Stata code, but as a self-taught novice in both I would be grateful for any help.
I am using margins after regress in Stata 16 to explore the interaction between a binary variable - walk - and a continuous variable - frail - upon predicted quality of life score - qol.
My issue is that the resulting predictions include values that sit outside of the possible range for qol, which ranges from -0.594 to 1 (this is a validated quality of life score for my population, despite the clunky range). Hopefully you can see this in the attached graph.png that predictions at low values of frail exceed the maximum value of 1.

I saw previous posts suggesting in different scenarios to consider truncreg or tobit, but I do not think these are valid. My dependent variable is - I think - neither censored nor truncated as there are only a certain range of possible values (like a percentage in an exam) rather than any potential values have been excluded.
In the full dataset, there are no participants with frail < 3 & walk == 1, nor for that matter are there any with frail >6 and walk == 0, as you may appreciate from:
Though a model without the interaction term does not show any worrying collinearity:
I therefore suspect the problem may be that the margins command is extrapolating the slow walk speed (red) line when frail < 3.
I have no idea how to "Stata" my way out of this, or if this simply relates to a problem of model selection and/or statistical heresy.
Once again, I would be very grateful for any insights.
Best wishes
Ben
I suspect the answer to this question may be that I am doing something statistically erroneous rather than a problem with my Stata code, but as a self-taught novice in both I would be grateful for any help.
I am using margins after regress in Stata 16 to explore the interaction between a binary variable - walk - and a continuous variable - frail - upon predicted quality of life score - qol.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte(walk frail) float qol 1 6 -.166 0 4 .62 1 4 .796 1 5 .487 1 6 .03 0 3 1 0 4 1 0 6 .796 0 4 1 1 5 .746 1 5 .727 0 5 1 0 3 .264 0 2 1 0 3 1 1 6 .273 0 4 .812 0 5 1 1 6 .587 1 8 .079 end label values walk slow label def slow 0 "Not Slow", modify label def slow 1 "Slow walk speed", modify
Code:
regress qol c.frail##i.walk margins walk, at(frail=(1(1)8)) plot
My issue is that the resulting predictions include values that sit outside of the possible range for qol, which ranges from -0.594 to 1 (this is a validated quality of life score for my population, despite the clunky range). Hopefully you can see this in the attached graph.png that predictions at low values of frail exceed the maximum value of 1.
I saw previous posts suggesting in different scenarios to consider truncreg or tobit, but I do not think these are valid. My dependent variable is - I think - neither censored nor truncated as there are only a certain range of possible values (like a percentage in an exam) rather than any potential values have been excluded.
In the full dataset, there are no participants with frail < 3 & walk == 1, nor for that matter are there any with frail >6 and walk == 0, as you may appreciate from:
Code:
tab2 walk frail
Code:
regress qol walk frail estat vif
I have no idea how to "Stata" my way out of this, or if this simply relates to a problem of model selection and/or statistical heresy.
Once again, I would be very grateful for any insights.
Best wishes
Ben
Comment