treating missing values due to lagged variable creation

Noor Hend

Join Date: Mar 2019

Posts: 11
#1

treating missing values due to lagged variable creation

26 Mar 2019, 04:00

Hello!

I am currently writing my master thesis on the implementation of GSCMP on financial performance. To evaluate the effect on financial performance one year later I want to lag my independent variable with one year. However, I only want to include observations that have at least two year of consecutive data available, I used the command indicated below. My initial dataset contains 8508 observations over a 10 year time period (2006-2015). I was wondering how to treat these 1436 generated missing values when doing my regression analyses, should I delete them or what is regular procedure when creating lagged variables?

. xtset ID year
panel variable: ID (unbalanced)
time variable: year, 2006 to 2015, but with gaps
delta: 1 unit

. by ID: gen L1 = GSCMP[_n-1] if year==year[_n-1]+1
(1436 missing values generated)

thank you in advance
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#2

26 Mar 2019, 04:17

Noor:
welcome to this forum.
Missing values are created by -L1- machinery and do not require any procedure aimed at dealing with them.
That said, I would recommend you to discuss with your supervisor the choice of including only -panelid- with 2 or more consecutives waves of data, as missing values might be informative (ie, not at random).

Kind regards,
Carlo
(Stata 19.0)
Comment
Noor Hend

Join Date: Mar 2019

Posts: 11
#3

26 Mar 2019, 04:35

Thank you for your fast reply Carlo! My reasoning behind only including ID's with 2 or more consecutive year observations was to enable evaluating GSCMP on next-years financial performance, by lagging GSCMP. I'm relatively new to Stata, do you think it's not necessary to limit my dataset including companies with at least two subsequent year observations?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#4

26 Mar 2019, 05:24

Noor:
the main issue here is not Stata, but the fact that you seemingly have an unbalanced panel dataset (by the way, Stata can handle both balanced and unbalanced panel datasets without any problem, so you do not have to worry about that).
You should exclude that firms that provided data, say, for one year only are different from those that provided data for 2 or more years; if that were the case, missingness would probably informative and you would end up with a dataset that is a non-random sample of the original one.

Kind regards,
Carlo
(Stata 19.0)
Comment
Noor Hend

Join Date: Mar 2019

Posts: 11
#5

26 Mar 2019, 06:07

I have indeed an unbalanced panel dataset to avoid certain biases. Thank you for your comment! I will consider this
Comment

Announcement

treating missing values due to lagged variable creation

Comment

Comment

Comment

Comment