Dear All,
Hi. I am aware that my title is probably confusing. I will explain my question in detail:
I would like to analyze how the effect of a continuous treatment, T, is conditional on a continuous covariate, X. The outcome variable is Y. I learned the most typical method to estimate a model with an interaction term, XT:
I think that this model estimates how the effect of T on Y is linearly dependent on X. The coefficient of XT implies how one unit change in X affects the effect of one unit change in T.
But I am wondering if this is a reasonable model. I speculate that the effect of T on Y is conditional on X in a non-linear manner. Specifically, I hypothesize that the effect of T on X will be significantly larger only when a subject have a value of X that is larger than 75 percentile. So I make a dummy variable, D, for the 75 percentile of X:
Thus D is an indicator for whether a subject have X larger than 75 percentile. D denotes a categorical effect of X.
However, I am confused about how I should make the interaction term. My question is that when I interact D with X, should I include X or main effect of D? That is, which one of the following model should I use?
The first Model assumes that there is no linear effect of X on Y. I do not think that this is desirable. I still believe that X has a linear effect.
The second model seems to be the mode typical interaction model. I basic lesson I learned is that when I include an interaction term (DT), I have to include the main effect of both variables. However, I think that the what the second model measures is somewhat awkward. The second model measures the linear effect of X on Y, and by including main effect of D, it also measures whether X has a categorical effect on Y (D is a indicator of more than 75 percentile X). I feel that it somewhat resembles RDD, and I did not really want to assume that there is such discontinuity in the effect of X.
I think that the third model reflects what I want to measure or what I want to assume. The third model keeps a linear effect of X on Y, and it also measures how the effect of T is contingent on the 75 percentile threshold of X (which is D). Unlike the second model, the third model did not assume any discontinuity in the effect of X. Thus I prefer model 3.
What confuses me is whether it is legitimate to do Model 3. Or the problem is that I do not know if it is legitimate to have an interaction effect without main effect in the model. I am wondering if Model 3 has some implicit problems.
Thank you very much for your advice.
Hi. I am aware that my title is probably confusing. I will explain my question in detail:
I would like to analyze how the effect of a continuous treatment, T, is conditional on a continuous covariate, X. The outcome variable is Y. I learned the most typical method to estimate a model with an interaction term, XT:
Code:
reg Y X T XT
But I am wondering if this is a reasonable model. I speculate that the effect of T on Y is conditional on X in a non-linear manner. Specifically, I hypothesize that the effect of T on X will be significantly larger only when a subject have a value of X that is larger than 75 percentile. So I make a dummy variable, D, for the 75 percentile of X:
Code:
xtile temp = X, n(4) table temp, gen(g) rename g4 D
However, I am confused about how I should make the interaction term. My question is that when I interact D with X, should I include X or main effect of D? That is, which one of the following model should I use?
Code:
//model 1 reg Y T D DT //model 2 reg Y T D DT X //model 3 reg Y T X DT
The second model seems to be the mode typical interaction model. I basic lesson I learned is that when I include an interaction term (DT), I have to include the main effect of both variables. However, I think that the what the second model measures is somewhat awkward. The second model measures the linear effect of X on Y, and by including main effect of D, it also measures whether X has a categorical effect on Y (D is a indicator of more than 75 percentile X). I feel that it somewhat resembles RDD, and I did not really want to assume that there is such discontinuity in the effect of X.
I think that the third model reflects what I want to measure or what I want to assume. The third model keeps a linear effect of X on Y, and it also measures how the effect of T is contingent on the 75 percentile threshold of X (which is D). Unlike the second model, the third model did not assume any discontinuity in the effect of X. Thus I prefer model 3.
What confuses me is whether it is legitimate to do Model 3. Or the problem is that I do not know if it is legitimate to have an interaction effect without main effect in the model. I am wondering if Model 3 has some implicit problems.
Thank you very much for your advice.
Comment