Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GLM choosing the right family

    Hey guys,

    I noticed my data aren't normally distributed. Therefore I wanted to use a glm, but I'm not sure what family to use. I saw on another post that the ICs are a good way for comparing. But are there any statistical tests included in stata that points to the right distribution or excludes wrong ones? Can I find my model when I try every possible distribution and just comparing the information criteria or do I have to look out for something? I know that the question about glm functions came often but I'm still confused. Note: I posted the histogram of the dependent variable down below.

    Thanks in advance

    Ben

    Click image for larger version

Name:	GLM.png
Views:	1
Size:	46.3 KB
ID:	1594468
    Last edited by Benjamin Krüger; 22 Feb 2021, 06:05.

  • #2
    Benjamin:
    if yiou're implictly refer to a linear regression, normality is a weak requirement for the residual distribution only.
    That said, -family(gamma), link(log)- are often used with economic data.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks Carlo, I noticed I shared the wrong picture. Thanks for the information.

      I will try the gamma distribution

      Comment


      • #4
        Benjamin:
        https://www.stata.com/bookstore/heal...s-using-stata/ devotes one chapter to GLM topic
        (5 Generalized linear models).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Sorry, but what are ICs? Do you mean AIC, BIC, etc.?

          Comment


          • #6
            Nick:
            I assumed Benjamin meant AIC/BIC.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              Benjamin:
              if yiou're implictly refer to a linear regression, normality is a weak requirement for the residual distribution only.
              That said, -family(gamma), link(log)- are often used with economic data.
              I'm not an economist, but one of the quantitative courses in my program introduced me to a test between the normal, Poisson, gamma, and inverse normal families in GLMs. You can probably make better sense of the algebra than I can, but I think the basic rationale is that you're testing the relationship between the mean and the variance of the data.

              The procedure itself doesn't involve any special commands, but it is several steps. An outline is available starting at pg 28 of this presentation. (The author was not the person who taught this to me.) Or in this paper by Partha Deb and Edward Norton (see pg 497). I believe the technique originated with Park (1966, Estimation with heteroskedastic error terms, Econometrica; I haven't personally reviewed this). There may also be a different procedure to select a link function, outlined in the Deb and Norton article.

              The context for the data are healthcare expenditure data, which have support of 0 to infinity (not literally, but you get the idea; see here for one real life but rare example of really high healthcare costs), and are right skewed.
              Last edited by Weiwen Ng; 22 Feb 2021, 12:58.
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment

              Working...
              X