Standardizing an interaction term

Tracy Lam

Join Date: Jul 2014

Posts: 91
#1

Standardizing an interaction term

03 Jun 2019, 20:35

I would like to enter an interaction term into my regression using standardized variables. I am generating a new variable that is a product term of two standardized variables. Do I need to restandardize this product term or leave it as it is?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#2

03 Jun 2019, 21:35

I recommend you leave it as is.

The ostensible purpose of standardizing predictors in a regression is to put them on a common scale so that their marginal effects can be compared. Now, in fact, that is just an illusion and in most situations standardizing variables just obfuscates the results. But I will spare you my lengthy rant on that topic. Going along with the illusion, bear in mind that unlike the "main" variables, interaction terms do not have marginal effects. There is no other effect to which it can be compared anyway. So nothing is gained by standardizing the product term.

And at least by leaving the product x1#x2 as is, its coefficient gives you a correct estimate of the difference between the (standardized) effects of x1 at different values of x2. If you standardize the product term then the coefficient really becomes completely meaningless and incomprehensible--only the t/z-test and p-value can be salvaged from it (as those will be the same whether you standardize or not.
1 like
Comment

Timo Partanen

Join Date: Nov 2019
Posts: 1

08 Nov 2019, 08:47

Is it really that t/z-test and p-value for will be the same whether you standardize the interaction or not?

Look this example I just got ..

Code:

reg dv iv c.iv#c.mod_var time cont_var_1 cont_var_2 cont_var_3 cont_var_4 cont_var_5 mod_var

dv                |    t        P>|t|
------------------+-------------------
iv                |    -2.83    0.005
c.iv#c.mod_var    |    4.85     0
time              |    4.95     0
cont_var_1        |    3.32     0.001
cont_var_2        |    3.48     0.001
cont_var_3        |    -1.11    0.267
cont_var_4        |    2.78     0.006
cont_var_5        |    -2.59    0.01
mod_var           |    -0.22    0.824
_cons             |    -4.84    0
--------------------------------------

foreach v of varlist * { 
    egen std_`v' = std(`v')
}

reg dv std_iv c.std_iv#c.std_mod_var std_time std_cont_var_1 std_cont_var_2 std_cont_var_3 std_cont_var_4 std_cont_var_5 std_mod_var

dv                       |    t       P>|t|
-------------------------+-----------------
std_iv                   |    6.29    0
c.std_iv#c.std_mod_var   |    4.85    0
std_time                 |    4.95    0
std_cont_var_1           |    3.32    0.001
std_cont_var_2           |    3.48    0.001
std_cont_var_3           |    -1.11   0.267
std_cont_var_4           |    2.78    0.006
std_cont_var_5           |    -2.59   0.01
std_mod_var              |    4.65    0
_cons                    |    15.88   0
-------------------------------------------

The t-test and P-value remain same for the interaction itself, but they change for independent and moderating variables. Indeed, coefficient signs flipped for both of these variables! Huge potential for misinterpretation of results.

Code:

egen std_ivXmod_var = std(c.std_iv#c.std_mod_var)

reg dv std_iv std_ivXmod_var std_time std_cont_var_1 std_cont_var_2 std_cont_var_3 std_cont_var_4 std_cont_var_5 std_mod_var

dv               |    t       P>|t|
-----------------+------------------
std_iv           |    -2.83   0.005
std_ivXmod_var   |    4.85    0
std_time         |    4.95    0
std_cont_var_1   |    3.32    0.001
std_cont_var_2   |    3.48    0.001
std_cont_var_3   |    -1.11   0.267
std_cont_var_4   |    2.78    0.006
std_cont_var_5   |    -2.59   0.01
std_mod_var      |    -0.22   0.824
_cons            |    17.45   0
------------------------------------

When I restandardized the interaction variable, I got the same t-test and P-values and coefficient signs than in the original non-standardized equation.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#4

08 Nov 2019, 11:17

Well, you've answered your own question. Good for you!

When there are interaction terms, the interaction of standardized variables is not equivalent to the standardized interaction of those variables--in fact, it is not, itself, standardized at all. As you can see, the latter gives the correct results, but not the former.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4946
#5

08 Nov 2019, 13:28

reg has a beta option, so I am not sure why you want to standardize yourself first. An additional problem is that the correct standardization may vary across regressions because the cases included may vary.

i too will spare you my lengthy rant against standardized coefficients. If I was going to present them I would probably present the non-standardized results too.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#6

08 Nov 2019, 13:36

Just want to clarify something I said in #4.

Standardizing the interaction term gives the correct t/z statistics and p-values. It does not however provide a coefficient that can be interpreted as an interaction effect. In fact, its coefficient is basically meaningless. If you want a correct interaction effect for standardized variables, you don't standardize the interaction term: then you get a meaningful coefficient, but incorrect t/z statistics and p-values.

So I am now torn between giving you my long rant against standardized variables or my long rant against statistical significance, or both. Probably I'll prefer mercy to justice and give you neither. But this little dilemma points up in a single problem yet another reason why both standardization and statistical significance testing are bad ideas that should be avoided.
Comment
Ruth-Alma Turkson-Ocran

Join Date: Mar 2018

Posts: 31
#7

22 Nov 2019, 13:48

I do have a question and it is related to age-standardization. I worked on a project examining the prevalence of hypertension between two different groups. Per NHIS/NHANES recommendations, I used the mean along with the svy commands such as here:

Code:

svy, subpop(if dropped==0 & immigrant): mean htn, stdize(agecat) stdweight(std_wgt)

for the unadjusted and for the adjusted:

Code:

svy, subpop(if immigrant==1 & dropped==0): mean htn married emp_stat edu_cat notcov usupl, stdize(agecat) stdweight(std_wgt) over(immigrant stay)

This gave me the outcome I was looking for, however, reviewers want to me to examine an interaction term. This code does not allow for the examination of interactions. So I tried seeing if I could use the other dstdize code like this:

Code:

dstdize OUTCOME std_wgt agecat if dropped==0, by(immigrant)

However, a few things are problematic here, the dstdize code does not allow for svy and the std_wgt variable has to be a non-negative integer, it is currently in decimals.

I need help in figuring out what to do regarding examining interaction e.g. between immigrant status and sex and what code to use here if anyone knows of a way around my above issue.

Last edited by Ruth-Alma Turkson-Ocran; 22 Nov 2019, 13:52.
Comment

Announcement