Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Normalize data to a reference year

    Dear all,
    I have a panel dataset, by firm and year, the first year is 1981, I have a xvar variable I want to normalize it using its 1981 value. e.g. if xvar=10 in year 1982, and its 1981 value is 5, then after normalization, the xvar_norm=10/5.

    Not all firms have 1981 xvar value, that is, I have gaps in the year for my firm data.

    My question is how I construct the normalized series in reference to 1981 value.
    My pretend data is here, actual data has 100,000 firm year records.

    Code:
    clear
    input int(firmid year) double xvar
    1000 1981 .94022655 
    1000 1982 1.288908 
    1000 1983 1.3905064 
    1000 1984 1.6384722 
    1002 1982 2.3624953 
    1002 1983 2.5735009 
    1002 1984 3.2092887 
    1002 1985 3.1472486 
    1003 1981 2.8487526 
    1003 1982 2.8611921 
    1003 1983 2.8942158 
    1003 1984 2.7964259 
    1003 1985 3.0314453 
    end
    Thanks,

    Rochelle


  • #2
    In theory this is easy, but data irregularities can make it hard. In your example, firm 1002 has no observation for year 1981. So how do you want to handle that? Leave xvar_norm missing for that firm? Or use 1982 as the reference year?

    And what would you do if the value of xvar itself were missing, or, for that matter, 0, in 1981 (or the reference year)?

    Comment


    • #3
      Discussed in http://www.stata-journal.com/sjpdf.h...iclenum=dm0055 especially Section 5.

      dm0055 is therefore an otherwise unpredictable search term for related discussions in this forum.

      Comment


      • #4
        Thanks Clyde ! Thanks Nick !


        @Nick, your stata journal article gave good explanation. I tried it section 5 on my data, basically for companies with no 1981 data will have missing value generated.

        @ Clyde, the issue you raised is valid. I see two ways to deal with, 1. follow Nick's http://www.stata-journal.com/sjpdf.h...iclenum=dm0055 especially Section 5.

        however, if I want to have non-missing values for firms with no data for 1981,

        either:
        I redefine baseyear for each firm, using the first year's data, then create reference variable

        or

        I compute a mean value of xvar for all non missing ones in 1981, then use that to scale xvar.

        Comment


        • #5
          There can't be a universal answer here.I would think about what is upstream and downstream of this aim.

          Where does the impetus come from? How are these results to be used? Even who is asking? receiving?

          Perhaps it's just a bad idea to use a base year with so many missing values, but would any other year be better? Perhaps you can extrapolate back to 1981 if you have missing values, but even then everything pivots for some cases on an estimated value and you have to justify that fact and how you did the extrapolation.

          Comment

          Working...
          X