Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multivariable modeling building and variable selection advice

    Hello fellow Stata users -

    I'm building a multivariable model to characterize predictors of a dependent variable. In my case, the dependent variable is HIV seroconversion in a prospective cohort (multiple visits per study participant). The independent variables include demographics (e.g., age, sex, education) and reported behavioral data (e.g., condom use, number of partners, etc.). Two independent variables will go into my model a priori (age and sex), but the remainder I've checked against the outcome in a bivariate / crude analysis; those independent variables that may be associated (p< 0.2) with my outcome will be included in my modelling exercise. This may leave me with a large number of potential covariates to evaluate.

    In the past, I've used backwards elimination stepwise variable selection where typically I would run a full model, and then remove the "least significant" independent variable (i.e., largest p value) and compare the two models to see which is the "best" fit (I believe I've used likelihood ratio testing) and I've also looked at my measure of effect (odds ratios) for each remaining independent variable; if an odds ratio changes by more than about 10%, that might suggest the independent variable I've removed is a potential confounder. I am using logistic regression for my model

    I have since learned that stepwise elimination of variables isn't always the best strategy, and have been encouraged for this analysis to adopt a new strategy. My statistician colleague (I'm an infectious disease epidemiologist) has told me about the R command glmulti for modeling to help with model building and variable selection. This command allows R to select covariates from a list, trying each possible set/subset of variables we designate in combinations until the best (by AIC, I believe) model is selected. I had thought maybe Stata would have commands that are analogous to this, however I have not used Stata for this type of modeling strategy and I could not find out if this was the case.

    Are you aware any Stata commands, or community developed commands that can facilitate this sort of model building strategy?

    Aside from reverting to R, and letting my statistician do this... Do you have any recommendations?

    Thanks in advance.

  • #2
    first, note that all variable selection procedures are controversial - someone will not like it

    second, I am not familiar with that R command but I guess that the user-written -allpossible- is similar; use -search- to find and download

    Comment


    • #3
      Hello Matt Price and welcome to Statalist. You are right to be leery of "stepwise" variable selection methods broadly defined. See the Stata FAQ on that topic and the comments in Frank Harrell's checklist: I am not a useR, but I took a quick peek at the documentation for glmulti, and I cannot find the words penalty or penalized. So it sounds to me like good old best subsets regression by another name, and that is lumped in with all the other "stepwise" methods in the Stata FAQ.

      Because you have multiple visits per patient, you need to use a method that accounts for the correlated nature of the data. I don't know OTTOMH whether Stata's LASSO commands work with those kinds of models. You could look into that. And maybe someone more knowledgeable than I will pipe up. Good luck with your work.

      Bruce
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment

      Working...
      X