Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standardized variables

    Hi,

    I have panel data and defined my variables accordingly:
    xtset id q_date,q
    if i want to standardize a variable by taking (v1-mean of v1)/(standard deviation of v1), can i do it as :

    Code:
    egen zv1=std(v1)
    ?? or should i standardize v1 for for each company(id)? in other words do we standardize at the aggregate level for all companies or for each company alone? if it is the latter how can i modify the codes?

    thanks


  • #2
    It depends on your goals. I can imagine situations in which standardization is irrelevant, as is possible with your code, situations in which standardization by year is a good idea, and yet others. A way to standardize by company is (for example)

    Code:
    bysort id : egen mean = mean(v1) 
    by id : egen sd  = sd(v1)
    gen wanted = (v1 - mean) / sd
    as bizarrely (to me) std() does not support by:.

    Comment


    • #3
      If you search for standardization and panel data in the listserv, you will find some discussions of it.

      As Nick correctly points out, the answer depends a lot on what you intend to do. In many cases, standardization is not needed and only makes things harder to interpret. It seems to me that standardization by panel would make things exceedingly hard to understand since you'll be subtracting different amounts from each panel and rescaling the variances for each panel. Instead of assuming a constant relation between your right hand side variables and your dependent variable, you would then be assuming a constant relation between these re-scaled variables where the rescaling varies by panel and the dependent variable. It is almost certainly easier and better to use xtreg instead.

      Comment


      • #4
        I am going to measure the standarized mean value of 4 environmental indicators (AA, BB, CC and DD).

        I want to take this value ((mean environmental value = (standard devi AA + sd BB + sd CC + sd DD)/4) for each country per each year.
        For example mean environmental value for USA for year 2010.
        Can somebody tell me how to use egen command for this purpose. Please see below my data (i have number of countries but here I included only few to explain my question)

        Country Name environment indicator 2010 2011 2012
        UK AA -1.32573 -1.34429 -1.26859
        UK BB -1.1179 -1.15356 -0.98963
        UK CC -0.22618 -0.36924 -0.38932
        UK DD -1.03593 -1.09778 -0.96978
        USA AA -1.26529 -1.26538 -1.26955
        USA BB -1.12054 -1.12548 -1.07625
        USA CC -0.66082 -0.59353 -0.85653
        USA DD -0.59469 -0.5459 -0.51529
        Australia AA 0.26126 0.318356 0.364368
        Australia BB -0.33356 -0.34626 -0.3786
        Australia CC -0.67317 -0.6633 -0.58567
        Australia DD 0.314489 0.166391 0.124486
        Burkina Faso AA -0.35061 -0.36786 -0.47889
        Burkina Faso BB -0.56885 -0.55905 -0.62399
        Burkina Faso CC -0.11764 -0.55538 -0.57313
        Burkina Faso DD -0.1564 -0.17954 -0.11886

        Comment


        • #5
          hi everyone,
          I am using data from PISA2018 and I am using the following command on version 16 of stata, I am trying to standardize a variable and use the following commands, the first one was suggested by a friend but it gives "invalid syntax" and the second one was mine which says "total not found".

          foreach i in st183q02ha {
          qui sum `i' [weight=w_fstuwt]
          gen sumwt=`r(sum-w)'
          gen wtmean=`r(mean)'
          egen double CSS=total(w_fstuwt*(`i'-wtmean)^2)
          gen double variance=CSS/sumwt
          gen double std_`i'=(`i'-wtmean)/sqrt(variance)

          drop sumwt wtmean
          }


          foreach i in st183q02ha {
          egen num = total (`i' * w_fstuwt)
          gen sumwt=total w_fstuwt
          gen wtmean= num/sumwt
          egen double CSS=total (w_fstuwt*(`i'-wtmean)^2)
          gen double variance=CSS/sumwt
          gen double std_`i'=(`i'-wtmean)/sqrt(variance)

          drop sumwt wtmean CSS
          }


          can anyone help me to identify the problem or suggest how I can carry out the standardization?

          Comment


          • #6
            The immediate problems evident in #5 are that

            In your first block of code the saved result after summarize is r(sum_w) -- that is, the punctuation is underscore _ not hyphen or minus sign.

            In your second block of code Stata is thrown by the space after total

            That said, a loop over one variable is pointless. Nor need you try to re-create what summarize already calculated for you.

            Code:
            su st183q02ha [weight=w_fstuwt]
            
            gen double wanted = (st183q02ha  - r(mean)) / r(sd)
            appears to be what you want. Naturally choose a variable name suitable for your purposes.

            Detail: There is a small loss of precision in using a macro persona `r(foo)' rather than a saved result r(foo).

            The thread already contains comments that standardization is not always needed or helpful any way!

            Comment


            • #7
              Thank you Nick. your comment was really helpful.

              Comment

              Working...
              X