Standardizing vs normalizing independent variables

Raja Hasan

Join Date: Feb 2020

Posts: 59
#1

Standardizing vs normalizing independent variables

20 Jul 2021, 22:46

Hi, I need help with the following:

In my dataset, I have two independent variables A and B. A is the difference between 65 and age (So, it ranges from -35 to + 40) and B is the ratio of Executives salary/ Total assets. For creating a composite score of these two variables, I need to add A and B. Since these two variables are on different scales, we can either standardize or normalize the two variables and add them to form a composite score/index/measure. My confusion is " Which one will I do? Will I standardize or normalize?
I have a feeling that I should normalize because in A we already have zero whose meaning in raw data and standardize data (if we standardize) will be different. But if I normalize, how will I explain the result economically?

Could you please help me? Are there any other options for me except standardizing/normalizing and then adding the two?
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

20 Jul 2021, 23:53

What is your definition of normalize? I take (value MINUS mean) / SD to be the most common meaning for standardize, but given that I have no idea what you mean by normalize. In any case, I note that such composite scores, while puzzlingly popular in some fields, usually create more problems than they solve/
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#3

21 Jul 2021, 03:33

Raja:
as an aside to Nick's helpful advice, wouldn't it be feasible to create an interaction between the Executives salary/ Total assets ratio and the difference in age (divided in classes; I know that categorizing a continuous predictor sounds methodoloogically sinister, but in this case it might be worth to give it a shot)?
Eventually, as per the upper range of the difference in age, it would seem that your sample includes executives aged 105. Is that true?

Kind regards,
Carlo
(Stata 19.0)
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#4

21 Jul 2021, 08:12

You sometimes create scales out of items measured in different ways. One way you might do that is by doing a z-score transformation for each variable (like Nick describes) and then adding them.

Having said that, it seems really weird to do that. While your measures may be correlated, they don't seem to create a scale.

I also share Carlo's concerns. An interaction may make more sense. I find it hard to believe there are 105 year old executives, and if there are I suspect they are extreme outliers.

I suggest what you define what you mean by standardize and normalize, and why you want to do this. If you explain what your ultimate goal is perhaps we can suggest better ways to reach it.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Raja Hasan

Join Date: Feb 2020

Posts: 59
#5

21 Jul 2021, 09:24

Nick, Carlo and Richard thank you very much for your helpful comments.
I agree with you that 105 years is extreme observation and most of the ranges are between -10 to 28. So, I think I can trim only a few observations.
By standardization, I mean the Z-score, and as I mentioned before Z-score does not fit well here as the age difference has zeros.
By normalization (Max-min), I mean (variable - r(min)) / (r(max) - r(min)). It is scaling the variable between 0 and 1.
I can't interact with the ratio and age difference because they are on different scales and higher values from age difference will bias the result.

So, basically, I want to normalize the two variables so that no specific dimension will dominate the statistics and then want to add the two to form a composite score.
Here, the higher age difference means that executives are younger and so they are highly motivated for the long-term perspective of the firm.
And, higher pay ratio means that the executives have the ability to influence the key decisions in the firm. In the literature executives who have a higher ability generally receive higher compensation.
So, by adding these two, we add executives' motivation and ability to influence key strategic decisions. And combinedly the score indicates governance mechanism from the bottom.
Higher the score, the stronger the governance.

A published paper in an accounting top journal standardize the two variables and then added the two to form a composite score but I think standardization is not completely correct here as age difference has zeros and standardizations will have zeros. The meaning of both the zeros is not the same.

The composite score will be the independent variable and my dependent variable will be log of the standard deviation of the market to book ratio.

Though I want to normalize the two variables, I am not sure what should be the correct one here. Should I standardize or normalize and why I should prefer one vs the other?
Could you please guide me on what other ways (if any or if needed) I can create the independent variable I want?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#6

21 Jul 2021, 10:00

Did the paper you are referring to use the same variables you are? If yes, why not do the same thing that they did? If not, are you sure that whatever they did is appropriate for your variables?

I'm not totally clear on what you have against zeroes. When you do a z-score transformation 0 = average score on the original variable. But your point about different scaling seems to be an argument against doing this in the first place.

However you transform the variables, if you add them together that constrains their effects to be equal. Rather than force them to be equal, I would probably enter them into the model as separate variables and then test whether their effects are equal.

Tests of equality can sometimes be interesting, especially when variables are measured in the same way (e.g. do husband's attitudes and wife's attitudes have the same effect on a couple's decisions?) But it seems odd to do it here. Maybe it would make sense if I had read the article or understood the underlying theory. In any event, I don't think I would force the effects of the two variables to be equal, but I might test whether they were if that seemed to be a reasonable hypothesis.

My (admittedly uneducated) guess, though, is that I would just have these as different variables in my model and not try to create a scale out of them.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Raja Hasan

Join Date: Feb 2020

Posts: 59
#7

21 Jul 2021, 10:20

Richard,

Let me give you an example.
Executive 1: age difference 20 and pay ratio is 0.030------------ Here, executive 1 has high motivation but less ability
Executive 2: age difference 20 and pay ratio is 0.070 ------------ Here, executive 2 has high motivation and high ability
Executive 3: age difference 5 and pay ratio is 0.030 ------------ Here, executive 3 has less motivation and less ability
Executive 4: age difference 5 and pay ratio is 0.070 ------------ Here, executive 4 has less motivation but high ability
Executive 5: age difference 20 and pay ratio is 0.070 ------------ Here, executive 5 has high motivation and high ability

from the above example, after scaling if I add the two, then executive 3 is likely to exhibit weak governance (low composite score) and executive 5 is likely to exhibit stronger governance (high composite score) and for the other, we might see the medium level of governance.
My goal is to show the governance and so I have to form a composite score here. I can do a separate test for each variable but that is not for all the tables. I ultimately need to show the impact of strong governance.
Another idea is to interact the two variables after scaling. Would it be better than just adding after scaling?

Last edited by Raja Hasan; 21 Jul 2021, 10:38.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#8

21 Jul 2021, 11:31

Once you start adding variables together though two different individuals might get the same score score but in very different ways. Do you really want to argue that the only thing that matters is the total score, and not how they get it?

To me an interaction term or terms makes far more sense.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#9

21 Jul 2021, 11:39

Adding them together also implies equal effects. Is that plausible or necessary? Why is it so important to get a single coefficient that captures the effects of both variables? Especially when it is not clear what you would have to do make the two variables addable.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
2 likes
Comment
Raja Hasan

Join Date: Feb 2020

Posts: 59
#10

21 Jul 2021, 12:57

Richard, Thank you very much.
I completely agree with you. That's an awesome question you asked. I now understand why adding is more problematic.
So, could you please tell me whether you are saying to interact after or before normalizing?
Also, could you please tell me whether I should normalize or standardize in this case?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#11

21 Jul 2021, 13:02

Alternatively, you could estimate a sheaf coefficient: that way you get your single effect without having to force the coefficients to be equal (which is what you are impicitly doing when you are adding variables: https://www.stata-journal.com/articl...article=st0261 ). You also don't need to standardize the variables, as the weights of each variable are estimated, so they automatically accommodate different scales. You can estimate your model with sheafcoef which is available from SSC. Also see this presentation: http://maartenbuis.nl/presentations/...indicators.pdf

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#12

21 Jul 2021, 13:19

I'm not totally sure why you want to do any transformation of the variables. The original metrics are usually easier to understand, e.g. I know what a year is, a standardized year is something I don't know unless you explain it to me.

Sometimes you standardize because you want to compare the effects of variables measured in different ways. But, that standardization may vary across samples and populations, e.g. if that 105 year-old hadn't happened to be in your sample your standardization could have been very different. If you are going to standardize, you probably want to standardize in a way that makes results comparable across samples and populations.

These handouts offer some thoughts on standardization:

https://www3.nd.edu/~rwilliam/stats2/l71.pdf (See especially the summary on p. 6)

https://www3.nd.edu/~rwilliam/xsoc73994/L04.pdf

Again, I am not familiar with the article you are referring to. Maybe it has great reasons or great methods I have never thought about before. If you think it does, then go over it again carefully. But, the fact that it is not obvious to you how to clone what they did makes me wonder how clear they are or whether what they did really applies in your case.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Raja Hasan

Join Date: Feb 2020

Posts: 59
#13

21 Jul 2021, 14:24

Maarten,

Thank you very much. I will invest time to understand sheafcoef. I am not at all familiar with that and so I am not sure how that would apply to my examples.

Richard,
Thank you for sharing the files. Yes, there are challenges in standardization and that's why I was not sure to use that even though the author in a published paper did. Here, standardization = (x - mean(x))/std(x). Here the mean from standardized value is Zero and STD is 1.

I understand your point about how standardization can create some other problems. But interacting the age difference and compensation ratio could also be affected by the large values of the age difference. That's what is concerning to me.
Another way we can do this is that we can interact the age difference with the compensation ratio and then we can either take log of the interaction term or scale the interaction term between 0 and 1 to reduce the large value effect of the age difference.
But then if I use the log of interaction term and then use the log of the dependent variable, how will I explain the economic significance? This is what is a little concern.

Last edited by Raja Hasan; 21 Jul 2021, 14:45.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4992
#14

21 Jul 2021, 16:01

I understand your point about how standardization can create some other problems. But interacting the age difference and compensation ratio could also be affected by the large values of the age difference.

This will be a concern whether you standardize or not. Rescaling variables does not mean you do not have extreme values; they will just be in a different metric. Z-score transformations are typically done as an aid to interpretation, not to get rid of outliers.

If you are concerned about outliers, you might take logs of variables, use spline functions, or add squared terms. For some ideas, see

https://www3.nd.edu/~rwilliam/stats2/l61.pdf

Or, just say that your population of interest does not include executives who are 105 years old.

Again, if the original article seems really great and similar to what you want to do, then try to clone what it does. But if not that great or not that similar, feel free to go a different path.

What is this article, anyway? If I could at least quickly skim it I might appreciate what the issues are.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Raja Hasan

Join Date: Feb 2020

Posts: 59
#15

21 Jul 2021, 16:28

Dr. Richard,

Here is the link to the article
https://papers.ssrn.com/sol3/papers....act_id=2666117
Comment

Announcement

Standardizing vs normalizing independent variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment