Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression with interaction, first expressions missing

    I am currently writing my bachelor thesis with Stata and I do not have much experience with it. I want to do a regression with interactions, however, whenever I enter the command the first expression is missing. For example, I run the following command:

    reg fltlnl age##health

    my expressions for age are: 1. young adults, 2. adults, 3. seniors, and my expressions for health are: 1. very good, 2. good, 3. fair, 4. bad, 5. very bad

    But in the output, the interactions for young adults and very good are missing. I cannot seem to find a solution for this issue anywhere.

    Help would be greatly appreciated!!

  • #2
    Here are some screenshots to hopefully understand my problem a little better.
    Attached Files

    Comment


    • #3
      Nothing is wrong with your model. This is expected behavior. See this Stata YouTube video by Chuck Huber. See also this introductory article on dummy/indicator variables.

      Comment


      • #4
        Thank you for your answer! In the video I saw something about base variables. However, I still do not understand how I am supposed to get any informations on the expressions "young adults" and "very good". Could you try to explain please if you understand?

        Comment


        • #5
          It would require a rather lengthy post to fully explain this. The key insight is that in a linear regression, you get a constant (_cons in Stata output). This is where you can locate the mean outcome value, given your covariates, for the the groups held out of the "first expressions." That is, young adults and those with very good health. In this case, the constant gives you the mean outcome value for subjects who are both young adults and in very good health. Every other coefficient in the "first expressions" tells you how much different the outcome mean for that group is relative to those that are young adults in very good health.

          For example, the coefficient for 3. Erwachsene, indicates that relative to young adults in good health (who have a mean outcome value of 1.27 according to the estimate in the _cons), you subtract .045 to get the outcome value for adults in very good health.

          To show you that these groups are still included in your model, you can get predicted margins for all the groups. This will be a messy graph, but you can see that all groups are on it.
          Code:
          reg fltlnl age##health
          margins reg age#health
          marginsplot
          I suggest reading the PDF documentation for margins in addition to Richard Williams' excellent Stata Journal article on margins.

          Comment


          • #6
            thank you so much for the explanation! I only just got to look at my problem again and then realized that most of the values of age##health do not seem to be significant due to the p-value even though the model itself seems to be significant. Is there any possible explanation for that?
            Attached Files

            Comment


            • #7
              There are a lot of good posts on the problems with evaluating model results based on statistical significance (e.g., here). Clyde Schechter response in this thread seems particularly relevant to your situation. I would say that you may want to examine whether the overall prediction is "significant" rather than focusing on the individual terms, which test that the outcome mean for that group is different than the outcome mean for the holdout group. You can instead jointly test whether all the coefficients involved in the interaction are equal to 0, which is essentially the same as "testing whether the interaction is significant."
              Code:
              reg fltlnl i.age##i.health
              testparm i.age#i.health
              You can also test particular contrasts of interest using the contrast command. As always read up on it in the PDF documentation.
              Code:
              help contrast
              It's worth noting that your two predictors are not explaining much of the variance in the outcome (R-squared = .06). So the interaction may not be very informative itself. It looks to me like health is the bigger factor of the two you have showed so far.

              Comment

              Working...
              X