Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standardized Z-scores with panel data

    I have panel data, and am trying to construct a zscore variable for each panel. We used the egen newvar = std(oldvar) command, but this only gave us zscores based on the mean and standard deviation of the entire dataset.

    Instead, I would like to have zscores calculated on the means and standard deviations of each panel. We assumed that, since we tsset the data as panel data, it would automatically do this, but this has not been the case.

    We were able to use the "by" option to sum zscores under each panel - is there a similar option that allows you to take the zscores by each panel?

    For more details, we are looking at different geographic areas over time. Each geographic area is a panel. We are trying to calculate the zscore of a flooding variable in each area so that we can see how much above or below a certain norm flooding levels were for each specific area.

    Thanks

  • #2
    Code:
    bysort panelvar : egen zscore=std(oldvar)
    cheers,
    Jeph

    Comment


    • #3
      Originally posted by Jeph Herrin View Post
      Code:
      bysort panelvar : egen zscore=std(oldvar)
      cheers,
      Jeph
      Unfortunately, this still does not work. I get the same error message that I have been receiving:

      egen ... std() may not be combined with by

      when look into it, it says to use the command if instead. This does let me make a zscore for a particular panel, but this means I will have 1,000+ columns (one zscore variable for each panel), and will have to input the command for each panel.

      Comment


      • #4
        Here's the "dumb" way. Combining -levelsof- with a -foreach- loop can always substitute for the by prefix.
        Code:
        sysuse auto,clear
        levelsof foreign, local(vlist)
        foreach n of numlist `vlist'    {
            qui su weight if foreign==`n'
            gen zscore`n'=(weight-`r(mean)')/`r(sd)' if foreign==`n'
        }
        Last edited by Aspen Chen; 15 Sep 2014, 14:54. Reason: Note: missed out on the condition for the -summarize- command

        Comment


        • #5


          Unfortunately, this still does not work.
          Apologies. This should

          Code:
          bys panelvar : egen mean=mean(oldvar)
          bys panelvar : egen sd = sd(oldvar)
          gen zscore=mean/sd
          cheers,
          Jeph

          Comment


          • #6
            Originally posted by Aspen Chen View Post
            Here's the "dumb" way. Combining -levelsof- with a -foreach- loop can always substitute for the by prefix.
            Code:
            sysuse auto,clear
            levelsof foreign, local(vlist)
            foreach n of numlist `vlist' {
            qui su weight if foreign==`n'
            gen zscore`n'=(weight-`r(mean)')/`r(sd)' if foreign==`n'
            }

            I attempted to use this syntax:

            Code:
            foreach n of numlist 'Geocode' {
              2. qui su weight
              3. gen zscore `n'=(weight-`r(mean)')/`r(sd)' if MODISData==`n'
              4. }
            but I received the following error:

            invalid numlist
            r(121);

            Comment


            • #7
              What I'm currently resorting to:

              Code:
              . egen z2modis3 = std(MODISData) if Geocode == 11415000
              (411272 missing values generated)
              
              . replace z2modis1 = z2modis3 if missing(z2modis1)
              (404 real changes made)
              
              . drop z2modis3
              Where z2modis1 is z-score column derived from the scores of the variable Geocode, and z2modis3 is a new variable of zscores from a specific geocode.

              I create a new zscore column for a specific geocode, then tell stata to replace missing values in my z2modis1 column with those values (so it's all consolidated into one column), before dropping the z2modis3 variable and starting it again for the next geocode.

              This works, but as there are 1,000 geocodes I have to go through, it's highly inefficient. I can't help but feel that there's got to be a better way to do this

              Comment


              • #8
                Andrew: What does your `Geocode' local look like? You can try to -display- it. In order to make it work, you would need it in the form of "10000 10001 10002". Another potential problem in your case is that the local might exceed the length limit. You can check out the values with -help limits-.

                BTW, I made a mistake. Here's the revision.
                Code:
                sysuse auto,clear
                levelsof foreign, local(vlist)
                gen zscore=.
                foreach n of numlist `vlist' {
                qui su weight if foreign==`n'
                replace zscore=(weight-`r(mean)'/`r(sd)') if foreign==`n'
                }
                If figuring out what's wrong with `Geocode' is too much trouble, go with Jeff's code.

                Jeff: Good code, but I believe the zscore is calculated as (yi-mean)/sd. That would make your last line:
                Code:
                gen zscore=(oldvar-mean)/sd
                Last edited by Aspen Chen; 15 Sep 2014, 15:26.

                Comment


                • #9
                  I suspect you were on the right track with your foreach loop but didn't properly initialize the numlist for it to use. Try this (assumes your panel variable is named Geocode--substitute the appropriate name if that's not correct):

                  Code:
                  levelsof Geocode, local(locations)
                  
                  gen z_weight = .
                  foreach g of numlist `locations' {
                      quietly summarize weight if Geocode == `g'
                      replace z_weight = (weight-`r(mean)')/`r(sd)' if Geocode == `g'
                  }

                  Comment


                  • #10
                    Originally posted by Jeph Herrin View Post



                    Apologies. This should

                    Code:
                    bys panelvar : egen mean=mean(oldvar)
                    bys panelvar : egen sd = sd(oldvar)
                    gen zscore=mean/sd
                    cheers,
                    Jeph

                    Thanks much! This worked - the only thing I had to change was to make the last line the following:

                    Code:
                    gen zmodis=(MODISData - meanmodis)/sdmodis
                    since the formula for zscore takes the difference between the mean and the actual value.

                    Thank you very much!

                    Comment


                    • #11
                      Andrew, I see in your response "Geocode" is a variable name. The numlist would not take a variable name. In order to make the -foreach- loop work, you would need to convert the unique values of Geocode into a macro using -levelsof-.

                      Comment

                      Working...
                      X