Dear Statalisters,
I am having a dataset with unbalanced panel data on class action suits, as stated below. The issue here is that some of the companies are having more than one records per year, but those records have different dates (see company_id=C), so they are not per se the same regarding the actual event date. The reason that a company may have multiple records is that it can receive more than one class action lawsuits within a year (which differ in date). The rest of the variables (e.g. Var1) represent a scoring for every class action suit, so it differs per company and time, but I also include other variables such as ASSETS that is the same for every company within the same year.
So, my main question is on how to treat this dataset.
Of course, if I am going to declare the dataset as a panel (xtset company_id year) I will get an error. In case that I'll type "xtset company_id " disregarding the time variable, would that solve my problem?
Further, I do not know if it helps, but my dataset's maximum number of multiple observations within a year for the same company is 5. Maybe creating a new set of variables equal to this number could be the solution?
Dropping the duplicate observations is not an option as my dataset is already quite limited.
company_id year event_date Var1 Assets etc.
A 2010 1/10/2010 55 5600
A 2012 5/30/2012 40 5550
A 2013 8/22/2013 38 5650
B 2011 7/12/2011 99 335
C 2009 3/11/2009 29 4300
C 2014 2/08/2014 35 6000
C 2014 6/24/2014 35 6000
I would really appreciate your help and thoughts.
I am having a dataset with unbalanced panel data on class action suits, as stated below. The issue here is that some of the companies are having more than one records per year, but those records have different dates (see company_id=C), so they are not per se the same regarding the actual event date. The reason that a company may have multiple records is that it can receive more than one class action lawsuits within a year (which differ in date). The rest of the variables (e.g. Var1) represent a scoring for every class action suit, so it differs per company and time, but I also include other variables such as ASSETS that is the same for every company within the same year.
So, my main question is on how to treat this dataset.
Of course, if I am going to declare the dataset as a panel (xtset company_id year) I will get an error. In case that I'll type "xtset company_id " disregarding the time variable, would that solve my problem?
Further, I do not know if it helps, but my dataset's maximum number of multiple observations within a year for the same company is 5. Maybe creating a new set of variables equal to this number could be the solution?
Dropping the duplicate observations is not an option as my dataset is already quite limited.
company_id year event_date Var1 Assets etc.
A 2010 1/10/2010 55 5600
A 2012 5/30/2012 40 5550
A 2013 8/22/2013 38 5650
B 2011 7/12/2011 99 335
C 2009 3/11/2009 29 4300
C 2014 2/08/2014 35 6000
C 2014 6/24/2014 35 6000
I would really appreciate your help and thoughts.
Comment