Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • After collapsing there is still repeated time value in panel

    Hello All,

    I'm working with repeated cross-section data and I need to make it a panel. Therefore, I collapsed the data at state-year level. But the problem arises for some states there is repeated value for the same year. Like 2009 for Alaska appears several times and this happens for Alaska every year in my data. There are some other states who are showing the same pattern. Since I need a panel I need to remove these duplicated values so that I can create a balanced panel.

    Can anyone help me to clean this data to make it a balanced panel by suggesting a code to drop that particular repeated observation?

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte statefip int year float(ln_weekearn ismarried)
    1 2009  4.929238  .5577256
    1 2010  4.904304  .5405608
    1 2011  4.845454  .5332485
    1 2012   4.75386  .5500775
    1 2013 4.6480546 .54819113
    1 2014  4.595987  .5432784
    1 2015  4.530446 .53653026
    1 2016  4.544947 .51959366
    1 2017  4.514979   .520384
    1 2018  4.450804 .52466446
    1 2019 4.3853836   .522446
    1 2020 4.3492937 .52406776
    1 2021 4.3409953  .5006733
    2 2009  5.078728   .566497
    2 2009  5.078728   .566497
    2 2009  5.078728   .566497
    2 2009  5.078728   .566497
    2 2010  5.086554 .55871594
    2 2010  5.086554 .55871594
    2 2010  5.086554 .55871594
    2 2010  5.086554 .55871594
    2 2011  5.023876 .54680026
    2 2011  5.023876 .54680026
    2 2011  5.023876 .54680026
    2 2011  5.023876 .54680026
    2 2012  4.898951  .5462924
    2 2012  4.898951  .5462924
    2 2012  4.898951  .5462924
    2 2012  4.898951  .5462924
    2 2013  4.799348 .55224735
    2 2013  4.799348 .55224735
    2 2013  4.799348 .55224735
    2 2013  4.799348 .55224735
    2 2014 4.7361693 .55205816
    2 2014 4.7361693 .55205816
    2 2014 4.7361693 .55205816
    2 2014 4.7361693 .55205816
    2 2015 4.6885076 .54512215
    end
    label values statefip statefip_lbl
    label def statefip_lbl 1 "Alabama", modify
    label def statefip_lbl 2 "Alaska", modify
    label def statefip_lbl 4 "Arizona", modify
    label def statefip_lbl 5 "Arkansas", modify
    label def statefip_lbl 6 "California", modify

  • #2
    What collapse command did you issue?

    Comment


    • #3


      I used this code. I'm quite puzzled how it turned out to be like that since I never got such collapsed data , and I did this before.

      Code:
      ** collapsing
      
      collapse ln_weekearn male ismarried wasmarried age black asian hispanic policy  [aw=earnwt], by(statefip year)

      Comment


      • #4
        Sorry, but I don't have an explanation of why the data in #1 could be the result of the code in #3.

        Comment


        • #5
          Mr. Cox,

          I did some mistakes while cleaning, and after correcting that I found the correct format. Much obliged for your time and patience. Honestly, just figured it out what I did wrong.

          Comment


          • #6
            Thanks for the closure!

            Comment


            • #7
              Sorry to be late to the party. A couple of thoughts on this.

              In the example data, all of the observations that have the same state and year also agree exactly on the other variables. That is, these observations are complete duplicates. So the easiest way to get rid of them is not with -collapse- but with just -duplicates drop-.

              However, the question still lingers: why are those duplicates there? While there are easy ways to banish them, their presence is a sign that something went wrong in the data management that led to the data set shown. And when one thing goes wrong, there may be other problems lurking, as yet undetected. In situations like this, I recommend undertaking a complete review of the data management that created the data set to identify how those duplicate observations got there in the first place. Scrutinize the code around that and look for other errors, or for code that was based on erroneous assumptions about the data. If you find those other errors, fix them now, before proceeding with other analyses. You shouldn't be in a rush to get wrong results: make sure the data is right before you analyze. If the data set was provided from an external source, so you cannot review how it was created, I recommend contacting the source, informing them of the problem, and asking them to review their work.

              Comment

              Working...
              X