Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Merge Data without Overwriting variables

    Dear all,

    I would like to merge 2 datasets without overwriting the variables that are found in both dataset. I'm using STATA 13. So I created an example to resemble such dataset. In dataset 1, I have id, x1 x2 x3 and x4; and in dataset 2, I have id y1 y2 y3 x1 x2 x3 x4 x5 x6 and x7. I did my code as follows:

    Code:
    use data1.dta
    merge 1:1 id using data2.dta
    As a result, I only see x1 x2 x3 and x4 once and not twice in the merge dataset (i.e they've been overwritten). However, unlike id which is identical in both dataset, the variables (x1 to x4) have different values in both dataset. For instance, values before a policy change (in data1), and values after policy change (in data2).

    Any possible suggestion, how to deal with this situation is warmly welcome. Thank you.

  • #2
    what you say implies that you already know which variables have the same names; so why not just rename in the master data?; e.g.:
    Code:
    rename x* x*_1
    if your problem is actually more complicate, then please clarify

    also, note that you are not overwriting the data in "1" with the data in "2" (you can do so with the replace option, but you did not use that); the data in the master are retained unchanged

    Comment


    • #3
      Thank you Rich for the suggestion, and correction for the right word - 'retaining' instead of saying 'overwriting'. Yes, I had in mind the 'renaming' of the variables. However, I thought there may be another approach.

      Comment


      • #4
        Owusu: I think that in your case, it is better to append your datasets instead of merging. Having opened data1.dta, type

        Code:
         append using "data2.dta"
        Once you append, create an indicator variable where you assign a value of "0" to all observations corresponding to the first dataset, and a value of "1" to all observations corresponding to the second dataset (The numbering is arbitrary, so in case you have 3 datasets corresponding to 3 treatments, you can label them "1, 2, and 3"). Assuming you just have 2 datasets and the observations for dataset 1 in the appended dataset run from row 1-50, you create the variable as follows (I call this variable "treatment")


        Code:
        gen treatment =1
        replace  treatment==0 if _n<51
        Now you can analyze the effect of the policy change ("your treatment"), making use of the indicator variable.
        Last edited by Andrew Musau; 27 Mar 2015, 01:21.

        Comment

        Working...
        X