Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generalised Linear Model, which family and link function?

    I'm wanting to use a GLM regression with a continuous dependent variable and a number of dummy independent variables. How would I be able to tell which family and link function is the most appropriate for my regression?

    Thanks in advance

  • #2
    Have you any ideas about the conditional distribution of the dependent variable, or the functional form of the relationship the dependent variables have on it?

    Comment


    • #3
      I'm not really sure what you mean, I'm a student so don't really have that much experience.
      I know the independent variables have just a linear relationship with the dependent variable.
      I'm not really sure about the conditional distribution of the dependent variable, is there a way I could check this?

      Thanks for your help.

      Comment


      • #4
        Is there a specific reason you chose not to start with "plain vanilla" linear regression, estimated via OLS? In other words, what is it that makes you think a generalized linear model with some other family and/or link function might be more appropriate?

        Best
        Daniel

        Comment


        • #5
          I am comparing different regression models with differing specifications to try and find which is the best predictor for my data set. The reason for choosing the GLM model as part of my analysis is due to my dependent variable not being normally distributed, instead being skewed toward the left.
          I am trying to mirror a similar type of analysis used in health economics for an essay, and the GLM model is frequently used in the literature.

          Thanks for your reply,
          Ash

          Comment


          • #6
            Your dependent/left-hand side/outcome/response variable is not required to follow a normal distribution to justify OLS to estimate a linear model. The assumption is that the errors are normal, but even this assumption is irrelevant for consistency and efficiency of the point estimates. In practice it turns out that, given sufficient sample size, roughly bell-shaped residuals are just fine for inference.

            The reminder is more of a general advice and details probably depend much on the research question you are trying to answer, the underlying theory, the very data you use etc.

            Regarding the part of your strategy/approach that "tries" differing specifications, which I understand as differing sets of predictors, you might run the risk of overfitting your model. Better to base your included predictors on sound economic theory and/or previous empirical findings.

            If you want to replicate what others have done with similar data, then I would start by using the very same models they did. I am all but an expert on GLM, but from the (marginal) distribution you describe, you might find a log link or an inverse gaussian distribution to better describe the data generating process. However, keep in mind that non-normal residuals in (simple) linear model may be due to omitted predictors rather than a "wrong" link function. This is to say, I would usually start off with the simplest model I can think of, which is the linear regression model.

            Best
            Daniel

            Comment


            • #7
              Thanks for your reply Daniel.

              For choosing the GLM link and family, would it be justifiable to run different combination of the link and family and choose the model with the lowest AIC/BIC statistic?

              Thanks,
              Ash

              Comment


              • #8
                Again, this depends on the research question you are trying to answer. In general it seems like a very exploratory and purely data driven "automatic" model selection, that in my view is not appropriate to answer any substantial research question in a "scientific" way.

                Aside from being conceptually very weak, this approach also seems suspicious from a purely technical point of view. Given the number of combinations of families and link functions that GLM offers and combining this with the before mentioned "differing specifications" you probably end up with a really huge number of models to fit. Note that AIC and BIC allow for comparisons across nested models. I doubt these measures are useful to compare different link functions - or at least I have never seen a statistical/theoretical justification for doing so.

                Best
                Daniel

                Comment


                • #9
                  Hi Ash,
                  The following mini course, Modeling Health Care Costs and Counts, provides guidance on selecting the link function (Box-Cox Test) and family distribution (modified "Park Test") for GLM.
                  http://harris.uchicago.edu/sites/def...minicourse.pdf

                  Tom

                  Comment


                  • #10
                    Thanks for your help Tom and Daniel.

                    Comment

                    Working...
                    X