Normalize data to a reference year

Rochelle Zhang

Join Date: Apr 2025

Posts: 0
#1

Normalize data to a reference year

28 Feb 2018, 11:29

Dear all,
I have a panel dataset, by firm and year, the first year is 1981, I have a xvar variable I want to normalize it using its 1981 value. e.g. if xvar=10 in year 1982, and its 1981 value is 5, then after normalization, the xvar_norm=10/5.

Not all firms have 1981 xvar value, that is, I have gaps in the year for my firm data.

My question is how I construct the normalized series in reference to 1981 value.
My pretend data is here, actual data has 100,000 firm year records.

Code:

clear input int(firmid year) double xvar 1000 1981 .94022655 1000 1982 1.288908 1000 1983 1.3905064 1000 1984 1.6384722 1002 1982 2.3624953 1002 1983 2.5735009 1002 1984 3.2092887 1002 1985 3.1472486 1003 1981 2.8487526 1003 1982 2.8611921 1003 1983 2.8942158 1003 1984 2.7964259 1003 1985 3.0314453 end

Thanks,

Rochelle
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

28 Feb 2018, 11:40

In theory this is easy, but data irregularities can make it hard. In your example, firm 1002 has no observation for year 1981. So how do you want to handle that? Leave xvar_norm missing for that firm? Or use 1982 as the reference year?

And what would you do if the value of xvar itself were missing, or, for that matter, 0, in 1981 (or the reference year)?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#3

28 Feb 2018, 11:42

Discussed in http://www.stata-journal.com/sjpdf.h...iclenum=dm0055 especially Section 5.

dm0055 is therefore an otherwise unpredictable search term for related discussions in this forum.
Comment
Rochelle Zhang

Join Date: Apr 2025

Posts: 0
#4

28 Feb 2018, 12:31

Thanks Clyde ! Thanks Nick !

@Nick, your stata journal article gave good explanation. I tried it section 5 on my data, basically for companies with no 1981 data will have missing value generated.

@ Clyde, the issue you raised is valid. I see two ways to deal with, 1. follow Nick's http://www.stata-journal.com/sjpdf.h...iclenum=dm0055 especially Section 5.

however, if I want to have non-missing values for firms with no data for 1981,

either:
I redefine baseyear for each firm, using the first year's data, then create reference variable

or

I compute a mean value of xvar for all non missing ones in 1981, then use that to scale xvar.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#5

28 Feb 2018, 12:40

There can't be a universal answer here.I would think about what is upstream and downstream of this aim.

Where does the impetus come from? How are these results to be used? Even who is asking? receiving?

Perhaps it's just a bad idea to use a base year with so many missing values, but would any other year be better? Perhaps you can extrapolate back to 1981 if you have missing values, but even then everything pivots for some cases on an estimated value and you have to justify that fact and how you did the extrapolation.
Comment

Announcement

Normalize data to a reference year

Comment

Comment

Comment

Comment