Merging Datasets with 2 Variables

Ali Bouba

Join Date: Mar 2024

Posts: 3
#1

Merging Datasets with 2 Variables

26 Mar 2024, 19:56

Hello All,

I am having troubles solving this issue:
I have a dataset1 that has the variable hhid(no duplicates) that is a household id and a bunch of other variables

I have another dataset2 where hhid can have duplicates but also have a variable pid that uniquely defines members of the household (pid does not exist in dataset1)

I want to merge both datasets such as:
When hhid is duplicated in dataset2, I make a copy of the existing line in dataset1 for each unique pid that exists for that hhid on dataset2, insert the same pids from dataset2 into those lines then merge on pid.

When hhid is not duplicated in dataset2, just merge on hhid
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#2

26 Mar 2024, 21:19

If I understand you correctly you are describing exactly what

Code:

use dataset1, clear merge 1:m hhid using dataset2

does.

If that is not what you meant, then post back showing example data from both data sets. Make sure the examples include some observations that should pair up in the two data sets. And then show what you want the final result to look like.

To show the data examples, please use the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment

Ali Bouba

Join Date: Mar 2024
Posts: 3

27 Mar 2024, 18:16

Thanks Clyde,

The first dataset looks like this:

Code:

hhid14_9 ksr3type
"001060000" "B"
"001060004" "B"
"001080000" "B"

The second dataset looks like this:

Code:

hhid14_9 pid14
"001060000"  9
"001060000" 10
"001060000" 11
"001060004"  3
"001060004"  4
"001080003"  3
"001080003"  4
"001080003"  5
"001080003"  6
"001080003"  7

The merged dataset should look like this:

Code:

hhid14_9 ksr3type pid14
"001060000" "B" 9
"001060000" "B" 10
"001060000" "B" 11
"001060004" "B" 3
"001060004" "B" 4
"001080000" "B" .

I hope this makes more sense.

Last edited by Ali Bouba; 27 Mar 2024, 18:18.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29796

27 Mar 2024, 19:11

Code:

. * Example generated by -dataex-. For more info, type help dataex
. clear

. input str11 hhid14_9 str3 ksr3type

        hhid14_9   ksr3type
  1. `""001060000""' `""B""'
  2. `""001060004""' `""B""'
  3. `""001080000""' `""B""'
  4. end

. tempfile dataset1

. save `dataset1'
file C:\Users\clyde\AppData\Local\Temp\ST_2a54_000001.tmp saved as .dta format

.
. * Example generated by -dataex-. For more info, type help dataex
. clear

. input str11 hhid14_9 byte pid14

        hhid14_9     pid14
  1. `""001060000""'  9
  2. `""001060000""' 10
  3. `""001060000""' 11
  4. `""001060004""'  3
  5. `""001060004""'  4
  6. `""001080003""'  3
  7. `""001080003""'  4
  8. `""001080003""'  5
  9. `""001080003""'  6
 10. `""001080003""'  7
 11. end

. tempfile dataset2

. save `dataset2'
file C:\Users\clyde\AppData\Local\Temp\ST_2a54_000002.tmp saved as .dta format

.
. use `dataset1', clear

. merge 1:m hhid14_9 using `dataset2', keep(master match)

    Result                      Number of obs
    -----------------------------------------
    Not matched                             1
        from master                         1  (_merge==1)
        from using                          0  (_merge==2)

    Matched                                 5  (_merge==3)
    -----------------------------------------

. sort hhid14_9 pid14

.
. list, noobs clean

       hhid14_9   ksr3type   pid14            _merge  
    "001060000"        "B"       9       Matched (3)  
    "001060000"        "B"      10       Matched (3)  
    "001060000"        "B"      11       Matched (3)  
    "001060004"        "B"       3       Matched (3)  
    "001060004"        "B"       4       Matched (3)  
    "001080000"        "B"       .   Master only (1)

In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Comment

Ali Bouba

Join Date: Mar 2024

Posts: 3
#5

27 Mar 2024, 20:27

Thank you very much sir.
Comment

Announcement