Standardized variables

Danielle leblanc

Join Date: May 2017

Posts: 26
#1

Standardized variables

04 May 2020, 03:34

Hi,

I have panel data and defined my variables accordingly:
xtset id q_date,q
if i want to standardize a variable by taking (v1-mean of v1)/(standard deviation of v1), can i do it as :

Code:

egen zv1=std(v1)

?? or should i standardize v1 for for each company(id)? in other words do we standardize at the aggregate level for all companies or for each company alone? if it is the latter how can i modify the codes?

thanks
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

04 May 2020, 03:51

It depends on your goals. I can imagine situations in which standardization is irrelevant, as is possible with your code, situations in which standardization by year is a good idea, and yet others. A way to standardize by company is (for example)

Code:

bysort id : egen mean = mean(v1) by id : egen sd = sd(v1) gen wanted = (v1 - mean) / sd

as bizarrely (to me) std() does not support by:.
1 like
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

05 May 2020, 10:51

If you search for standardization and panel data in the listserv, you will find some discussions of it.

As Nick correctly points out, the answer depends a lot on what you intend to do. In many cases, standardization is not needed and only makes things harder to interpret. It seems to me that standardization by panel would make things exceedingly hard to understand since you'll be subtracting different amounts from each panel and rescaling the variances for each panel. Instead of assuming a constant relation between your right hand side variables and your dependent variable, you would then be assuming a constant relation between these re-scaled variables where the rescaling varies by panel and the dependent variable. It is almost certainly easier and better to use xtreg instead.
Comment

Mahinda Sene

Join Date: Oct 2020
Posts: 9

26 May 2021, 08:19

I am going to measure the standarized mean value of 4 environmental indicators (AA, BB, CC and DD).

I want to take this value ((mean environmental value = (standard devi AA + sd BB + sd CC + sd DD)/4) for each country per each year.
For example mean environmental value for USA for year 2010.
Can somebody tell me how to use egen command for this purpose. Please see below my data (i have number of countries but here I included only few to explain my question)

Country Name	environment indicator	2010	2011	2012
UK	AA	-1.32573	-1.34429	-1.26859
UK	BB	-1.1179	-1.15356	-0.98963
UK	CC	-0.22618	-0.36924	-0.38932
UK	DD	-1.03593	-1.09778	-0.96978
USA	AA	-1.26529	-1.26538	-1.26955
USA	BB	-1.12054	-1.12548	-1.07625
USA	CC	-0.66082	-0.59353	-0.85653
USA	DD	-0.59469	-0.5459	-0.51529
Australia	AA	0.26126	0.318356	0.364368
Australia	BB	-0.33356	-0.34626	-0.3786
Australia	CC	-0.67317	-0.6633	-0.58567
Australia	DD	0.314489	0.166391	0.124486
Burkina Faso	AA	-0.35061	-0.36786	-0.47889
Burkina Faso	BB	-0.56885	-0.55905	-0.62399
Burkina Faso	CC	-0.11764	-0.55538	-0.57313
Burkina Faso	DD	-0.1564	-0.17954	-0.11886

Comment

Maryam Azimi

Join Date: Mar 2021

Posts: 2
#5

22 Nov 2023, 02:02

hi everyone,
I am using data from PISA2018 and I am using the following command on version 16 of stata, I am trying to standardize a variable and use the following commands, the first one was suggested by a friend but it gives "invalid syntax" and the second one was mine which says "total not found".

foreach i in st183q02ha {
qui sum `i' [weight=w_fstuwt]
gen sumwt=`r(sum-w)'
gen wtmean=`r(mean)'
egen double CSS=total(w_fstuwt*(`i'-wtmean)^2)
gen double variance=CSS/sumwt
gen double std_`i'=(`i'-wtmean)/sqrt(variance)

drop sumwt wtmean
}

foreach i in st183q02ha {
egen num = total (`i' * w_fstuwt)
gen sumwt=total w_fstuwt
gen wtmean= num/sumwt
egen double CSS=total (w_fstuwt*(`i'-wtmean)^2)
gen double variance=CSS/sumwt
gen double std_`i'=(`i'-wtmean)/sqrt(variance)

drop sumwt wtmean CSS
}

can anyone help me to identify the problem or suggest how I can carry out the standardization?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

22 Nov 2023, 03:54

The immediate problems evident in #5 are that

In your first block of code the saved result after summarize is r(sum_w) -- that is, the punctuation is underscore _ not hyphen or minus sign.

In your second block of code Stata is thrown by the space after total

That said, a loop over one variable is pointless. Nor need you try to re-create what summarize already calculated for you.

Code:

su st183q02ha [weight=w_fstuwt] gen double wanted = (st183q02ha - r(mean)) / r(sd)

appears to be what you want. Naturally choose a variable name suitable for your purposes.

Detail: There is a small loss of precision in using a macro persona `r(foo)' rather than a saved result r(foo).

The thread already contains comments that standardization is not always needed or helpful any way!
1 like
Comment
Maryam Azimi

Join Date: Mar 2021

Posts: 2
#7

22 Nov 2023, 05:01

Thank you Nick. your comment was really helpful.
Comment

Announcement

Standardized variables

Comment

Comment

Comment

Comment

Comment

Comment