Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple lineal regression / Dummy variable, interactions and confounding variables

    Hi.
    I apologize if this is no a question I should post here. I am trying to analyze the interaction and confusion between variables in order to create a multiple linear regression model using STATA 15.

    Dependent variable: HDL cholesterol (mg / dL) - quantitative variable (colhdl)
    Independent variable: alcohol consumption - qualitative variable / 3 categories (1 “non-drinker”, 2 “moderate drinker” and 3 “risk drinker”).

    1) I do not know if the variables should be added to the model as it is analyzed if there are interactions or if they appear to be confounding variables, but what if they are not or if only one category of an analyzed variable is significant?

    For example:
    - regress colhdl i.drinker -> gender: binary variable (1: male, 2: female)
    - regress colhdl i.drinker if gender == 1
    - regress colhdl i.drinker if gender == 2
    - regress colhdl i.gender ## i.drinker

    ----------stata:
    ----------------------Coef. Std. Err. t P>|t| [95% Conf. Interval]
    female #
    mod drinker | .7557405 1.212174 0.62 (0.533) -1.621379 3.13286

    female #
    risk drinker | 3.273935 1.678158 1.95 (0.051) -.0169984 6.564868
    _cons | 44,57028 .778798 57.23 (0.000) 43.04303 46.09753
    *p value in parenthesis.

    If it is not significant, do I not add this variable to the model? Or should I assess whether it is a confounding variable?
    What do I do if only one of the dummies is significant? How do I add it to the model if I have other variables?

    Thank you so much.

  • #2
    Carla:
    welcome to this forum.
    I would go:
    Code:
    regress colhdl i.gender##i.drinker
    That said, significance should not be the yeardstick to decide about the appropriateness of a given regression model. It is much more important to give a fair and true view of the data generating process.
    For the future, please share what you typed and what Stata gave you back via CODE delimiters (see the FAQ). Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you so much. But I don't understand how an interaction should be added to the model if one of the dummies is not significant, or should it be removed from the model?

      Code:
      regress colhdl i.drinker i.gender b3.age1 i.study bmi b2.tobacco i.drinker##i.age1
      Code:
      Source | SS df MS Number of obs = 2,215
      -------------+---------------------------------- F(15, 2199) = 60.38
      Model | 124483.729 15 8298.9153 Prob > F = 0.0000
      Residual | 302224.002 2,199 137.437018 R-squared = 0.2917
      -------------+---------------------------------- Adj R-squared = 0.2869
      Total | 426707.732 2,214 192.731586 Root MSE = 11.723
      
      ------------------------------------------------------------------------------------
      colhdl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------------+----------------------------------------------------------------
      bebedor |
      bebedor moderado | 4.86706 1.013781 4.80 0.000 2.878992 6.855128
      bebedor de riesgo | 6.970327 1.186814 5.87 0.000 4.642935 9.29772
      |
      sexo |
      mujer | 12.13005 .5319249 22.80 0.000 11.08692 13.17318
      |
      edad1 |
      1 | -.5335322 1.212904 -0.44 0.660 -2.912089 1.845025
      2 | -1.581575 .9929852 -1.59 0.111 -3.528862 .3657119
      |
      estudios |
      secundarios 2 | -.7658614 .6078963 -1.26 0.208 -1.957972 .4262496
      secundarios 1 | -1.403784 .836681 -1.68 0.094 -3.044552 .2369838
      primarios y menos | -1.400124 .7687614 -1.82 0.069 -2.907699 .1074505
      |
      imc | -.7409274 .0594553 -12.46 0.000 -.8575218 -.6243329
      |
      tabaco |
      no fumador | 2.620133 .582125 4.50 0.000 1.478561 3.761706
      exfumador | 3.639366 .6639092 5.48 0.000 2.337412 4.941321
      |
      bebedor#edad1 |
      bebedor moderado #|
      1 | -4.227452 1.49112 -2.84 0.005 -7.151603 -1.303302
      bebedor moderado #|
      2 | -1.136889 1.316344 -0.86 0.388 -3.718297 1.444518
      bebedor de riesgo #|
      1 | -4.827771 2.076663 -2.32 0.020 -8.900197 -.7553452
      bebedor de riesgo #|
      2 | -3.359233 1.723989 -1.95 0.051 -6.74005 .0215836
      |
      _cons | 63.81949 1.9187 33.26 0.000 60.05683 67.58214
      ------------------------------------------------------------------------------------
      Here moderate drinkers and risk drinker (in women) are not significant, should I keep this interaction or not? (maybe, is there any other test to prove that there is an interaction?), or should I test if it is a confounding variable?

      Thank you in advance.

      Data base: https://drive.google.com/file/d/1E3j...ew?usp=sharing
      Last edited by Carla RAS; 02 Feb 2020, 05:34.

      Comment


      • #4
        Carla:
        I would keep it anyhow.
        Non significant results are as informative as significant ones.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you.

          Comment

          Working...
          X