I am building a logistic regression model to describe variables that correlate with my dichotomous outcome. I'm aware that there are no "good" model building strategies, only less bad ones - or that seems to often be the hot take of my statistical colleagues.
My strategy thus far has been to compare each variable alone with the outcome (my "crude" analysis), and those that may be correlated (I used a generous p value < 0.20) I would put into a full model, something like this:
I look over the results, and pick out the variable with the worst (highest) p value, remove it, and run the code again. I continue this, until 1) I have all "significant" (p<0.05) covariates remaining and/or 2) any "non significant" (p>0.05) covariates whose removal significantly affects my measure of effect may be kept in as a potential confounder (i.e., I compare the odds ratios for remaining covariates of the reduced model to the previous model. If any change by more than 10-15%, I keep the removed covariate, and go on to the next-worst p value)
Years ago, a statistician I worked with used R (I think? might have been SAS) to design a model that iteratively compared every combination of an intitial set of covariates until it came up with the "best" model. I can't recall how it defined "best", but I'm guessing some combination of the pseudo R2, covariate p values, and perhaps something else?
Does anyone here have any experience with something like this? Can Stata do this? I'm not a statistician, and don't even know how to begin to search on a topic like this. If anyone would be kind enough to point me in the right direction (what search keywords might you recommend? Any good reading, that's ideally not too math-heavy? - wishful thinking, I know)
Thanks in advance. I really appreciate these forums.
My strategy thus far has been to compare each variable alone with the outcome (my "crude" analysis), and those that may be correlated (I used a generous p value < 0.20) I would put into a full model, something like this:
Code:
logistic outcomevariable covariate1 covariate2 covariate3 covariate4 covariate5 covariate6
Years ago, a statistician I worked with used R (I think? might have been SAS) to design a model that iteratively compared every combination of an intitial set of covariates until it came up with the "best" model. I can't recall how it defined "best", but I'm guessing some combination of the pseudo R2, covariate p values, and perhaps something else?
Does anyone here have any experience with something like this? Can Stata do this? I'm not a statistician, and don't even know how to begin to search on a topic like this. If anyone would be kind enough to point me in the right direction (what search keywords might you recommend? Any good reading, that's ideally not too math-heavy? - wishful thinking, I know)
Thanks in advance. I really appreciate these forums.
Comment