Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorical variables in backward regression model

    Hello everyone,

    I have a question, which I guess is basic, but I cant get the answer Im looking for in the internet and forums. Probably it will not be a yes or no answer and thats why. It would be great if you could help:

    In my backward regression model I have different categorical variables. How to put the categorical variable in the model is not the problem. But: if in the backward regression one of the subgroups of the categorical variable is the one with the highest p-values and would be the one to fall out of the model next but for example the other subgroup has boarderline significance (0.07), do you leave the variable in and remove the variable with the second highest p-value from the model or do you remove the categorical variable?

    Sorry if its a too basic question.

    Thanks for your help.

    Kind regards,

    Isabel

  • #2
    Do not do stepwise regression in the first place. Here is why.

    More directly to your question, excluding one indicator from the set of indicators representing a categorical variable changes the interpretation of the others and is one more example of the terrible ideas automatic model selection produces.

    Best
    Daniel

    Comment


    • #3
      But to create a model I have to do a backward regression. I dont use a automatic model selection. I take away the variables one by one. And I know that I cant remove one of the subgroups of a categorical variable and leave the others. The question is if one subgroup has the highest p-value and I would remove it from the analysis and the others dont have significant p-values (which would lead to the variable be left in the analsis even if one subgroup has the highest p-value, right?) but boarderline p-values around 0.05 like 0.06 or 0.07 should I be as consequent to remove the variable or not or does it depend on other factor? Maybe you gave me the answer already but Im not realising it. Sorry if thats the problem.

      Kind regards,

      Isabel

      Comment


      • #4
        But to create a model I have to do a backward regression.
        No, you do not. You need (economic) theory to build a model. Unless your research question is "Which of the variables I have collected have the highest multiple correlation with some other variable in this specific sample of mine?", you do not have to do stepwise regression and it is probably a very bad idea to do it. I have given the link to only some concerns about this way of building models. Note that the argument does not depend on who excludes the variables by looking at p-values, you or the software.

        Best
        Daniel

        Comment


        • #5
          I agree with Daniel on the question of automatic/arbitrary selection of variables for models, and I don't know the "proper" way to run a backward regression, but:

          Perhaps a better way to deal with a group of indicators is with a joint F-test, or using ANOVA to assess the significance of the category as a whole. If the group as a whole has the highest p-value, then do what you will.

          Stata/MP 14.1 (64-bit x86-64)
          Revision 19 May 2016
          Win 8.1

          Comment


          • #6
            Isabel:
            Daniel's replies to your previous identical query were absolutely in line with the conception that backward (stepwise) regression is not the way to go (please, see http://www.statalist.org/forums/foru...ard-regression),
            Unfortunately, nothing has changed in favour of backward regression from then on.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Okay, I see. Why is it then recommended by my statistician? And in medical research when its about finding risk factors for a certain outcome basically everybody uses backward regression. So what would be the best way to analyse a group of variables and their influence on a dependent variable?

              Comment


              • #8
                Thank you! Would have been nice of my statistician to tell me. Do you know what would be the right thing to create a model and look at risk factors of a certain outcome?

                Comment


                • #9
                  Isabel:
                  just echoing Daniel's replies, the best way to conceive a regression model (or a mandatory methodological step, at least) is looking up to the theoretical approaches in your research field.
                  Skimming through the literature and find inspiration in what other researchers did about the same research topic is one way to do it. This approach, which should not be confused with plagiarism, is research itself and, last but not least, usually makes thing smoother with journal reviewers.
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment


                  • #10
                    Why is it then recommended by my statistician?
                    You should ask her/him. Maybe this will help her/him and many others asking for advice in the future.

                    And in medical research when its about finding risk factors for a certain outcome basically everybody uses backward regression.
                    I guess this is why everyone outside medicine who knows about statistics is usually skeptical when confronted with new claims that some tea, or vegetable or whatever "causes" cancer or some illness - which reminds me off ....

                    So what would be the best way to analyse a group of variables and their influence on a dependent variable?
                    Well, as I said, it is hard to believe that there is no theory about the risk factors or at least existing empirical evidence from previous studies in this area. If this were the case, how would you know which variables to collect in the first place?

                    If there is no research on the phenomena of interest, then there might be knowledge about a related phenomena which might theoretically be adapted. If this really is purely explorative - i.e. the first study analyzing some phenomenon - then I would probably start from some common sense "hypotheses" trying to build a theory. You might in a first step look at group differences. If you find any, you might think about what differs between these groups that might affect the outcome. You could then statistically control for such factors and see whether they "explain" away the group differences. Note that this is the opposite direction of what you have in mind. But do not put too much trust in such results, either. Such approach might give first hints, but these hints have then to be put together in a reasonable theory which is then to be tested with a new sample.

                    Best
                    Danie
                    Last edited by daniel klein; 04 Mar 2015, 09:14.

                    Comment


                    • #11
                      Thank you all for your help. There are quite a few paper on my research topic (brain haemorrhage) but unfortunately backward regression seems to be the favourite tool with medical research. Just went through it again and its what they all used; they validated the model with bootstrapping and shrinked the regression coefficient by an estimated shrinkage factor. The only other thing found was a Cox proportional hazard regression model. Just for the ones that are interested.

                      Thanks again for your help, really appreciated it. Will have some reading to do now :-)

                      Comment


                      • #12
                        they validated the model with bootstrapping and shrinked the regression coefficient by an estimated shrinkage factor
                        This sounds as if you are referring to the "lasso" method, rather than stepwise regression. (Cf. http://statweb.stanford.edu/~tibs/lasso.html ). If so, then ssc describe lars and follow links. [I found this using findit lasso.]

                        Comment

                        Working...
                        X