Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging data: adding duplicate observations to match other dataset with cases of multiple observations pr year.

    Hello Satalist,

    I need to merge two datasets. I am matching them with two variables 1. 'country_id' and 2. 'year'. The first dataset has one observation pr. year for each country_id. The other has cases of multiple observations pr year for some countries.
    The dataset with multiple observations records regimes , where there sometimes are more than one pr year, while the other just has general country data.

    Here is an example:

    input long(country_id year) str29 country_name3
    3 1905 "Mexico"
    3 1906 "Mexico"
    3 1907 "Mexico"
    3 1908 "Mexico"
    3 1909 "Mexico"
    3 1910 "Mexico"
    3 1911 "Mexico"
    3 1911 "Mexico"
    3 1911 "Mexico"

    3 1912 "Mexico"
    3 1913 "Mexico"
    3 1913 "Mexico"
    3 1914 "Mexico"
    3 1914 "Mexico"

    The other dataset just has the one observation pr year.
    I would like to know if it is at all possible to create matching multiple observations in the other dataset, with the same values on all variables as the original observation pr. year? I would very much like to avoid having to delete my duplicates.

    I hope my question is clear. Thanks in advance.

  • #2
    that is what a one-to-many merge does,
    Code:
    clear
    input float(country_id year bnp)
    3 1905 100
    3 1906 200
    3 1907 300
    end
    tempfile first
    save `first'
    
    clear
    input float(country_id year)
    3 1905
    3 1906
    3 1907
    3 1907
    end
    tempfile second
    save `second'
    
    use `first', clear
    merge 1:m country_id year using `second'
    
    list, ab(10) clean
    
           country_id   year   bnp        _merge  
      1.            3   1905   100   Matched (3)  
      2.            3   1906   200   Matched (3)  
      3.            3   1907   300   Matched (3)  
      4.            3   1907   300   Matched (3)

    Comment


    • #3
      Thanks for the response. However I am still getting 'variables country_id year do not uniquely identify observations in the master data'. Am I doing something wrong?

      Comment


      • #4
        try changing 1:m to m:1. if it still fails, load the dataset that supposedly only have one observation per country per year and show the results of -duplicates report country_id year-.
        Last edited by Øyvind Snilsberg; 28 Apr 2022, 07:51.

        Comment


        • #5
          Changing 1:m to m:1 did the trick - it works now. Thank you!

          Comment

          Working...
          X