Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I keep only the matched observation after data set merge (STATA 15)

    Hi,

    I used the one to many formula to merge two data sets. Once done, how can I keep only the matched observations?

    merge 1:m hhcode using "xyz.dta"
    (label province already defined)
    (label region already defined)

    Result # of obs.
    -----------------------------------------
    not matched 379,662
    from master 142 (_merge==1)
    from using 379,520 (_merge==2)

    matched 120,347 (_merge==3)
    -----------------------------------------


    Best,
    Shehryar

  • #2
    Code:
    keep if _merge == 3
    also see
    Code:
    help merge
    please read the FAQ for this forum

    Comment


    • #3
      Thank you so much!

      Cheers,
      Shehryar

      Comment


      • #4
        Hi all,

        I am working with a panel data of two time periods and i am struggling to keep only the matched observations (i.e Only those observations that are present in both time periods) in my data set. Which code can i run to do that?

        Regards,
        Pokothoane

        Comment


        • #5
          When asking for help with writing code, you should always show example data, as the code will depend on things like variable names, data types, layout, etc. The helpful way to do that is with the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

          For now, assuming your panel data is in long period, if your panel identifier is called panel_id and your time variable is called time, and there are only two time periods, then it's:

          Code:
          by panel_id (time), sort: keep if _N == 2
          However, since you posted this on a thread that refers to keeping matched observations after -merge-, I wonder if perhaps your "panel data" is in wide layout as a result of having -merge-d (rather than -append-ed) two data sets. If so, the advice given in #2 will do the job. But better still is to -reshape long- and then do what I have suggested here. The reason is that all your panel data analysis will ultimately require you to go to long layout, and it's usually best to do that sooner rather than later.

          Comment


          • #6
            Thank you very much. This is exactly the code that I wanted.

            Next tims i will ensure to always explain my data and variables used as you already mentioned.

            Many thanks.

            Comment

            Working...
            X