Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding rows for additional panel years

    Hello everyone,

    I am using Stata 18. have the following data (just a snapshot is below):

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long cmiecode int year str2 pc11_state_id str3 pc11_district_id
    113 2013 "27" "518"
    113 2013 "27" "518"
    113 2013 "27" "518"
    113 2013 "27" "518"
    113 2013 "27" "518"
    171 2013 "07" "095"
    362 2013 "30" "586"
    362 2013 "30" "586"
    600 2013 "09" "164"
    600 2013 "09" "164"
    600 2013 "09" "164"
    600 2013 "09" "164"
    600 2013 "09" "164"
    600 2013 "09" "164"
    600 2013 "09" "164"
    600 2013 "09" "164"
    600 2013 "09" "164"
    725 2013 "28" "536"
    780 2000 "19" "341"
    780 2000 "19" "341"
    end
    I will need to match it to another dataset over the period 2013-2021, using pc11_state_id & pc11_district_id.
    I need to replicate rows for each cmiecode such that, the same rows recur from the year they appear in the above table.
    So if a row has the year '2013', it will need to reappear from 2013-2021.
    This may seem pointless for this file - but the file it will be matched to will have varying values for the same state & districts each year.
    Please help if possible.

    Thank you
    Last edited by Sneha Thayyil; 29 Mar 2024, 15:00.

  • #2
    I need to replicate rows for each cmiecode such that, the same rows recur from the year they appear in the above table.
    I don't think you do.

    I'm a little confused by your data example, because it appears to contain many observations that are exact duplicates of each other. May I assume that the data set actually contains additional variables not shown, and that those might differ among observations that appear identical as shown?

    If you run
    Code:
    joinby pc11_state pc11_district_id using other_data_set
    Stata will create the additional observations needed and match every observation in the first data set with every observation in the other data set that agrees with it on pc11_district_id and pc11_state. The only possible snag is if your other_data_set also contains a variable named year, exactly the same as in this data set. In that case, to assure that the information gets carried across properly, you need to either -drop year- in the first data set or -rename- it, perhaps to year_1, if it is necessary to distinguish the years for information from the two data sets.

    If this does not work correctly for you, please post back with example data from both data sets, choosing examples that produce incorrect results. Then describe in what way the results you are getting differ from what you want.

    Comment


    • #3
      Dear Clyde,
      Thank you very much for your response. Yes, you were right about the duplicates - created both, by the nature of the data (which had multiple locations corresponding to the same company), and by the thoughtless error of using merge m:m.
      Your reply pointed me towards the correct steps, which were a combination of using merge 1:m or m:1 wherever unique IDs could be created, and joinby where not possible to do so - and now the issue is sorted.

      Thank you once again
      Last edited by Sneha Thayyil; 13 Apr 2024, 00:44.

      Comment

      Working...
      X