Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression on different sub-groups

    Hi all,

    This is the dataex of my dataset. I am working on the impact of drinking alcohol on earnings with the following variables:

    input double days_drink_wk float drink_intensity byte educgrp long earnings
    .5 4 1 5000
    .5 .5 3 20000
    .5 .5 3 24000
    1 1 4 101099
    2 12 3 11000
    3 30 3 30000
    3 6 4 100000
    .5 .5 2 24000
    • days_drinks_wk: numbers of days drinking per week
    • drink_intensity: numbers of drinking units per week
    • educgrp: education level
    • earnings: annual income in $$$
    So I would like to categorise them into different subgroups in terms of drink_intensity (0: abstainers, 1-7: light drinkers, 8-15: Moderate drinkers,...) and run a regression to find how these different groups affect earnings. I also want to include demographic variables like age, education, marital status. Please find the example below:
    Click image for larger version

Name:	Screenshot 2022-03-29 192510.png
Views:	1
Size:	186.4 KB
ID:	1656851



    The way I do it is generate new variables for each of the group and run individual regression, as below:

    generate abstainers = drink_intensity if inrange(drink_intensity, 0, 0.5)
    generate light_dr = drink_intensity if inrange(drink_intensity, 0.6, 7)
    generate light_moderate_dr = drink_intensity if inrange(drink_intensity, 7.1, 21)
    generate moderate_dr = drink_intensity if inrange(drink_intensity, 21.1, 43)
    generate moderate_heavy_dr = drink_intensity if inrange(drink_intensity, 43.1, 64)
    generate heavy_dr = drink_intensity if inrange(drink_intensity, 64.1, 86)
    generate very_heavy_dr = drink_intensity if (drink_intensity > 86)

    reg earnings abstainers educgrp married
    All of the code work, but I don't think it is the correct way to do because it will generate different coefficients of the demographic variables for each group. So my questions are:

    1. Do I run it correctly, and if not, could you let me know the correct code? Just one two example and I could work out the other group.
    2. Should I take log in the earnings, currently earnings are in $ but I want to interpret data as % of earnings increase as we drink more or less. If yes, could you let me know the code?

    Thank you very much!

  • #2
    Anh:
    1) I would create a categorical variable including all the subgroups of drinkers and plug it in the right-hand side of the regression equation (see -fvvarlist- and -label-);
    2)
    Code:
    gen ln_earnings=ln(earnings)
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Anh:
      1) I would create a categorical variable including all the subgroups of drinkers and plug it in the right-hand side of the regression equation (see -fvvarlist- and -label-);
      2)
      Code:
      gen ln_earnings=ln(earnings)
      Thank you so much. It works. I am wondering if I want to do separate regression for male and female (as in the screenshot above), should I run the gender variable as a factor variable, or should I use the if command? Many thanks!!

      Code:
      regression ln_earnings i.drink_intensity i.gender
      or

      Code:
      regression ln_earnings i.drink_intensity if gender == 1

      Comment


      • #4
        Anh:
        I would add -i.gender- in the right-hand side of my regression equation.
        Then I would -predict fitted, xb- the fitted values for male and female.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Thank you so much Carlo. Got it!!!

          Comment

          Working...
          X