Hi All
I'm testing for interaction effects using two different approaches and I am assuming both give the same or very similar answers. However, I would like some confirmation that this is indeed true: both methods are assessing the same thing.
I have two categorical predictor variables: income which has 4 categories (Q1 to Q4) and homelessness (yes or no - coded as 1 or 0). The outcome is binary - general health coded as 0 (good) and 1 (not good).
1. In the first approach, I combine the categories of the two predictors to create a new variable 'homeincome' (thus having 4X2, 8 categories: income Q1 & not homeless, income Q1 & homeless, income Q2 & not homeless etc.). This is the 'inter-categorical' approach to testing for interactions and combing the categories from the two variables incorporates interactions:
mi xeq: gen homeincome = 0
mi xeq: replace homeincome = 0 if incomeq3x==1 & homeever==0
mi xeq: replace homeincome = 1 if incomeq3x==1 & homeever==1
mi xeq: replace homeincome = 2 if incomeq3x==2 & homeever==0
mi xeq: replace homeincome = 3 if incomeq3x==2 & homeever==1
mi xeq: replace homeincome = 4 if incomeq3x==3 & homeever==0
mi xeq: replace homeincome = 5 if incomeq3x==3 & homeever==1
mi xeq: replace homeincome = 6 if incomeq3x==4 & homeever==0
mi xeq: replace homeincome = 7 if incomeq3x==4 & homeever==1
tab homeincome
label var homeincome "Homelessness & family income"
label define homeincome 0 "Q1-NH" 1 "Q1-Home" 2 "Q2-NH" 3 "Q2-home" 4 "Q3-NH" 5 "Q3-home" 6 "Q4-NH" 7 "Q4-home"
label values homeincome homeincome
tab homeincome
I run the logistic model followed by margins to get estimated probabilities for each category:
2. In the second approach, I test for interactions in the more common way, followed my margins to generate probabilities:
The estimated probabilities for poor general health (genhealth) obtained from the two approaches are very similar (almost the same):
For example, 1st approach; 0.10 for income Q1 & not homeless, and 0.27 for income Q1 & homeless. Corresponding probabilities for these two categories in second approach are 0.11 & 0.28
It seems like the only difference is that the first approach (inter-categorical) gives the estimate for the interaction effect including the reference category which is missing in the second approach - but nonetheless these are still obtained when estimating predicted probabilities.
Thanks!
/Amal
I'm testing for interaction effects using two different approaches and I am assuming both give the same or very similar answers. However, I would like some confirmation that this is indeed true: both methods are assessing the same thing.
I have two categorical predictor variables: income which has 4 categories (Q1 to Q4) and homelessness (yes or no - coded as 1 or 0). The outcome is binary - general health coded as 0 (good) and 1 (not good).
1. In the first approach, I combine the categories of the two predictors to create a new variable 'homeincome' (thus having 4X2, 8 categories: income Q1 & not homeless, income Q1 & homeless, income Q2 & not homeless etc.). This is the 'inter-categorical' approach to testing for interactions and combing the categories from the two variables incorporates interactions:
mi xeq: gen homeincome = 0
mi xeq: replace homeincome = 0 if incomeq3x==1 & homeever==0
mi xeq: replace homeincome = 1 if incomeq3x==1 & homeever==1
mi xeq: replace homeincome = 2 if incomeq3x==2 & homeever==0
mi xeq: replace homeincome = 3 if incomeq3x==2 & homeever==1
mi xeq: replace homeincome = 4 if incomeq3x==3 & homeever==0
mi xeq: replace homeincome = 5 if incomeq3x==3 & homeever==1
mi xeq: replace homeincome = 6 if incomeq3x==4 & homeever==0
mi xeq: replace homeincome = 7 if incomeq3x==4 & homeever==1
tab homeincome
label var homeincome "Homelessness & family income"
label define homeincome 0 "Q1-NH" 1 "Q1-Home" 2 "Q2-NH" 3 "Q2-home" 4 "Q3-NH" 5 "Q3-home" 6 "Q4-NH" 7 "Q4-home"
label values homeincome homeincome
tab homeincome
I run the logistic model followed by margins to get estimated probabilities for each category:
Code:
mi est, post or: logistic genhealth i.homeincome
Code:
Multiple-imputation estimates Imputations = 35
Logistic regression Number of obs = 10,232
Average RVI = 0.0227
Largest FMI = 0.1514
DF adjustment: Large sample DF: min = 1,504.96
avg = 2961484.66
max = 1.67e+07
Model F test: Equal FMI F( 7,805393.2) = 17.68
Within VCE type: OIM Prob > F = 0.0000
genhealth Odds ratio Std. err. t P>t [95% conf. interval]
homeincome
Q1-Home 3.25 1.23 3.12 0.002 1.55 6.81
Q2-NH 0.78 0.09 -2.26 0.024 0.63 0.97
Q2-home 4.13 1.80 3.26 0.001 1.76 9.69
Q3-NH 0.57 0.07 -4.70 0.000 0.45 0.72
Q3-home 8.78 4.74 4.02 0.000 3.05 25.27
Q4-NH 0.45 0.05 -7.61 0.000 0.36 0.55
Q4-home 1.35 1.03 0.39 0.694 0.30 6.03
_cons 0.11 0.01 -29.22 0.000 0.10 0.13
Note: _cons estimates baseline odds.
Code:
mimrgns (homeincome), predict(pr) cmdmargins
Code:
Multiple-imputation estimates Imputations = 35
Adjusted predictions Number of obs = 10,232
Average RVI = 0.0225
Largest FMI = 0.1502
DF adjustment: Large sample DF: min = 1,529.35
avg = 8.93e+57
Within VCE type: Delta-method max = 7.15e+58
Expression : Pr(genhealth), predict(pr)
Margin Std. err. t P>t [95% conf. interval]
homeincome
Q1-NH 0.10 0.01 15.00 0.000 0.09 0.12
Q1-Home 0.27 0.07 3.70 0.000 0.13 0.41
Q2-NH 0.08 0.01 13.34 0.000 0.07 0.09
Q2-home 0.32 0.09 3.43 0.001 0.14 0.50
Q3-NH 0.06 0.01 11.26 0.000 0.05 0.07
Q3-home 0.50 0.13 3.74 0.000 0.24 0.76
Q4-NH 0.05 0.00 14.02 0.000 0.04 0.06
Q4-home 0.13 0.09 1.52 0.129 -0.04 0.31
Code:
Code:
mi est, post or: logistic genhealth i.homeever##i.incomeq3x i.sex
Multiple-imputation estimates Imputations = 35
Logistic regression Number of obs = 10,232
Average RVI = 0.0494
Largest FMI = 0.0719
DF adjustment: Large sample DF: min = 6,636.38
avg = 15,076.10
max = 23,384.59
Model F test: Equal FMI F( 8,125529.5) = 16.58
Within VCE type: OIM Prob > F = 0.0000
------------------------------------------------------------------------------------
genhealth | Odds ratio Std. err. t P>|t| [95% conf. interval]
-------------------+----------------------------------------------------------------
1.homeever | 3.27 1.23 3.15 0.002 1.56 6.82
|
incomeq3x |
2 | 0.75 0.08 -2.59 0.010 0.60 0.93
3 | 0.55 0.07 -4.95 0.000 0.44 0.70
4 | 0.43 0.05 -7.88 0.000 0.35 0.54
|
homeever#incomeq3x |
1 2 | 1.65 0.94 0.87 0.384 0.54 5.06
1 3 | 4.54 3.00 2.29 0.022 1.24 16.59
1 4 | 0.91 0.77 -0.11 0.915 0.17 4.80
|
2.sex | 1.20 0.10 2.34 0.019 1.03 1.41
_cons | 0.11 0.01 -26.08 0.000 0.09 0.13
------------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
Code:
mimrgns (homeever##i.incomeq3x), predict(pr) cmdmargins
Code:
Multiple-imputation estimates Imputations = 35
Predictive margins Number of obs = 10,232
Average RVI = 2.0130
Largest FMI = 0.0716
DF adjustment: Large sample DF: min = 6,690.66
avg = 16,838.73
Within VCE type: Delta-method max = 27,973.68
Expression : Pr(genhealth), predict(pr)
------------------------------------------------------------------------------------
| Margin Std. err. t P>|t| [95% conf. interval]
-------------------+----------------------------------------------------------------
homeever |
0 | 0.07 0.00 26.85 0.000 0.06 0.07
1 | 0.27 0.05 5.48 0.000 0.18 0.37
|
incomeq3x |
1 | 0.11 0.01 15.33 0.000 0.09 0.12
2 | 0.08 0.01 13.70 0.000 0.07 0.10
3 | 0.07 0.01 11.82 0.000 0.05 0.08
4 | 0.05 0.00 14.00 0.000 0.04 0.06
|
homeever#incomeq3x |
0 1 | 0.11 0.01 15.01 0.000 0.09 0.12
0 2 | 0.08 0.01 13.34 0.000 0.07 0.09
0 3 | 0.06 0.01 11.29 0.000 0.05 0.07
0 4 | 0.05 0.00 14.05 0.000 0.04 0.06
1 1 | 0.28 0.07 3.79 0.000 0.13 0.42
1 2 | 0.32 0.09 3.46 0.001 0.14 0.51
1 3 | 0.49 0.13 3.70 0.000 0.23 0.75
1 4 | 0.13 0.09 1.52 0.128 -0.04 0.31
------------------------------------------------------------------------------------
The estimated probabilities for poor general health (genhealth) obtained from the two approaches are very similar (almost the same):
For example, 1st approach; 0.10 for income Q1 & not homeless, and 0.27 for income Q1 & homeless. Corresponding probabilities for these two categories in second approach are 0.11 & 0.28
It seems like the only difference is that the first approach (inter-categorical) gives the estimate for the interaction effect including the reference category which is missing in the second approach - but nonetheless these are still obtained when estimating predicted probabilities.
Thanks!
/Amal
Comment