How to Merge Data without Overwriting variables

Owusu Sarpong

Join Date: Mar 2015

Posts: 6
#1

How to Merge Data without Overwriting variables

26 Mar 2015, 10:42

Dear all,

I would like to merge 2 datasets without overwriting the variables that are found in both dataset. I'm using STATA 13. So I created an example to resemble such dataset. In dataset 1, I have id, x1 x2 x3 and x4; and in dataset 2, I have id y1 y2 y3 x1 x2 x3 x4 x5 x6 and x7. I did my code as follows:

Code:

use data1.dta merge 1:1 id using data2.dta

As a result, I only see x1 x2 x3 and x4 once and not twice in the merge dataset (i.e they've been overwritten). However, unlike id which is identical in both dataset, the variables (x1 to x4) have different values in both dataset. For instance, values before a policy change (in data1), and values after policy change (in data2).

Any possible suggestion, how to deal with this situation is warmly welcome. Thank you.
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#2

26 Mar 2015, 10:53

what you say implies that you already know which variables have the same names; so why not just rename in the master data?; e.g.:

Code:

rename x* x*_1

if your problem is actually more complicate, then please clarify

also, note that you are not overwriting the data in "1" with the data in "2" (you can do so with the replace option, but you did not use that); the data in the master are retained unchanged
Comment
Owusu Sarpong

Join Date: Mar 2015

Posts: 6
#3

26 Mar 2015, 12:15

Thank you Rich for the suggestion, and correction for the right word - 'retaining' instead of saying 'overwriting'. Yes, I had in mind the 'renaming' of the variables. However, I thought there may be another approach.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10089
#4

27 Mar 2015, 01:19

Owusu: I think that in your case, it is better to append your datasets instead of merging. Having opened data1.dta, type

Code:

append using "data2.dta"

Once you append, create an indicator variable where you assign a value of "0" to all observations corresponding to the first dataset, and a value of "1" to all observations corresponding to the second dataset (The numbering is arbitrary, so in case you have 3 datasets corresponding to 3 treatments, you can label them "1, 2, and 3"). Assuming you just have 2 datasets and the observations for dataset 1 in the appended dataset run from row 1-50, you create the variable as follows (I call this variable "treatment")

Code:

gen treatment =1 replace treatment==0 if _n<51

Now you can analyze the effect of the policy change ("your treatment"), making use of the indicator variable.

Last edited by Andrew Musau; 27 Mar 2015, 01:21.
Comment

Announcement

How to Merge Data without Overwriting variables

Comment

Comment

Comment