After collapsing there is still repeated time value in panel

Tariq Abdullah

Join Date: Apr 2021
Posts: 366

After collapsing there is still repeated time value in panel

23 Feb 2022, 08:19

Hello All,

I'm working with repeated cross-section data and I need to make it a panel. Therefore, I collapsed the data at state-year level. But the problem arises for some states there is repeated value for the same year. Like 2009 for Alaska appears several times and this happens for Alaska every year in my data. There are some other states who are showing the same pattern. Since I need a panel I need to remove these duplicated values so that I can create a balanced panel.

Can anyone help me to clean this data to make it a balanced panel by suggesting a code to drop that particular repeated observation?

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte statefip int year float(ln_weekearn ismarried)
1 2009  4.929238  .5577256
1 2010  4.904304  .5405608
1 2011  4.845454  .5332485
1 2012   4.75386  .5500775
1 2013 4.6480546 .54819113
1 2014  4.595987  .5432784
1 2015  4.530446 .53653026
1 2016  4.544947 .51959366
1 2017  4.514979   .520384
1 2018  4.450804 .52466446
1 2019 4.3853836   .522446
1 2020 4.3492937 .52406776
1 2021 4.3409953  .5006733
2 2009  5.078728   .566497
2 2009  5.078728   .566497
2 2009  5.078728   .566497
2 2009  5.078728   .566497
2 2010  5.086554 .55871594
2 2010  5.086554 .55871594
2 2010  5.086554 .55871594
2 2010  5.086554 .55871594
2 2011  5.023876 .54680026
2 2011  5.023876 .54680026
2 2011  5.023876 .54680026
2 2011  5.023876 .54680026
2 2012  4.898951  .5462924
2 2012  4.898951  .5462924
2 2012  4.898951  .5462924
2 2012  4.898951  .5462924
2 2013  4.799348 .55224735
2 2013  4.799348 .55224735
2 2013  4.799348 .55224735
2 2013  4.799348 .55224735
2 2014 4.7361693 .55205816
2 2014 4.7361693 .55205816
2 2014 4.7361693 .55205816
2 2014 4.7361693 .55205816
2 2015 4.6885076 .54512215
end
label values statefip statefip_lbl
label def statefip_lbl 1 "Alabama", modify
label def statefip_lbl 2 "Alaska", modify
label def statefip_lbl 4 "Arizona", modify
label def statefip_lbl 5 "Arkansas", modify
label def statefip_lbl 6 "California", modify

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35810
#2

23 Feb 2022, 08:22

What collapse command did you issue?
1 like
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#3

23 Feb 2022, 09:00

I used this code. I'm quite puzzled how it turned out to be like that since I never got such collapsed data , and I did this before.

Code:

** collapsing collapse ln_weekearn male ismarried wasmarried age black asian hispanic policy [aw=earnwt], by(statefip year)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35810
#4

23 Feb 2022, 09:03

Sorry, but I don't have an explanation of why the data in #1 could be the result of the code in #3.
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#5

23 Feb 2022, 09:06

Mr. Cox,

I did some mistakes while cleaning, and after correcting that I found the correct format. Much obliged for your time and patience. Honestly, just figured it out what I did wrong.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35810
#6

23 Feb 2022, 10:43

Thanks for the closure!
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#7

23 Feb 2022, 11:11

Sorry to be late to the party. A couple of thoughts on this.

In the example data, all of the observations that have the same state and year also agree exactly on the other variables. That is, these observations are complete duplicates. So the easiest way to get rid of them is not with -collapse- but with just -duplicates drop-.

However, the question still lingers: why are those duplicates there? While there are easy ways to banish them, their presence is a sign that something went wrong in the data management that led to the data set shown. And when one thing goes wrong, there may be other problems lurking, as yet undetected. In situations like this, I recommend undertaking a complete review of the data management that created the data set to identify how those duplicate observations got there in the first place. Scrutinize the code around that and look for other errors, or for code that was based on erroneous assumptions about the data. If you find those other errors, fix them now, before proceeding with other analyses. You shouldn't be in a rush to get wrong results: make sure the data is right before you analyze. If the data set was provided from an external source, so you cannot review how it was created, I recommend contacting the source, informing them of the problem, and asking them to review their work.
2 likes
Comment

Announcement

After collapsing there is still repeated time value in panel

Comment

Comment

Comment

Comment

Comment

Comment