Merge different variables with same variable name

Alexandra Mossenberg

Join Date: Jan 2016

Posts: 9
#1

Merge different variables with same variable name

26 Jan 2016, 06:17

Halo everybody,

I'm trying to merge varios datasets using merge command. Datasets contain company-informations for year x. Unfortunately despite beeing sets of different years the names of variable are the same among datasets. E.g. EBIT, Margin_x etc.

What I'm loooking for is a way add the new data as a new variable without the overwriting the existing one, while ignoring the same fact, that both have the same name. Renaming of the variables manually is unfortunately not an option. Example:

Dataset a (1999):
company | EBIT
A | 123
B | 122
C | .

Dataset b (2000):
company | EBIT
A | 453
C | 980
D | 486

Goal:
company | EBIT_1 | EBIT_2
A | 123 | 453
B | 122 | .
C | . | 980
D | . | 486

As you can see some values may be missing. In the merge's help file there is a way to update the file but I couldn't find a way to simply out the data in a new variable.

Plese help
Tags: merge

1 like
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#2

26 Jan 2016, 06:24

you appear to want your final data set in wide format - why? in general, this is not a good idea

however, if you really want it that way I would do in three steps: (1) add a variable giving the year to each data set; (2) append the files (not merge them); (3) use the reshape command to change the structure; of course, if you actually want the data in long form, stop after step 2
2 likes
Comment
Masood Sadat

Join Date: Aug 2014

Posts: 38
#3

25 Aug 2018, 08:10

This solution does not work for me!

I have two datasets of respondents of two members of households. Most of the variables in each dataset are unique, however some variables have exact same name (e.g. start time, end time, father name). I want to merge them in a way that the duplicate variable names take a suffix (e.g. start_x and start_y), but the rest of the variables names that are unique stay the same.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

25 Aug 2018, 08:34

If you read the Forum FAQ, you will learn, among other things, that saying that something "does not work" is pretty useless. There are thousands of ways in which code can fail to do what you expect of it (some relating to the code, and some to your expectations), and it is impossible to guess which problem you are facing.

The description you give in your second paragraph contains nothing that would be a barrier to the solution in #2 working with your data. So if you want help, I suggest you post back showing the exact code that you used to try to implement the solution in #2, and examples of the data in the two data sets you are working with. And, if it is not very obvious, also explain why the results you get are not what you want.

To show the example data, be sure to use the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
1 like
Comment
Masood Sadat

Join Date: Aug 2014

Posts: 38
#5

25 Aug 2018, 14:43

Thanks Clyde for the note. I am going to explain my situation using a sample data.

I have two datasets data1 and data2. There are three type of variables in each dataset: 1) an ID variable that matches in both dataset, 2) an income variable in both dataset but with different content which should be kept, and 3) the rest of variables.

My question is whether we can merge the datasets with ID variable, and have variables with similar names kept by assigning suffix to their names. Here is a reproducible example:

Code:

clear input float income str4 id str1 sex 2200 "hh1" "m" 3000 "hh2" "f" 3900 "hh3" "m" 3500 "hh4" "m" 2900 "hh5" "f" end save "~/data1.dta", replace clear input float income str4 id float q1 4000 "hh1" 1 3400 "hh2" 2 4100 "hh3" 1 2100 "hh4" 1 2700 "hh5" 2 end save "~/data2.dta", replace merge 1:1 id using "~/data1.dta"

After merging, the variable income from the using dataset is not kept as there is a variable with the same name in the master dataset. What I want is to have keep both variables e.g. income_x and income_y (something that r does easily).

Thanks.
Comment
Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#6

25 Aug 2018, 14:59

You can quickly rename all vars in data2 (and data1 if you prefer):

Code:

use "~/data2.dta", clear *rename all vars in data2 foreach var of varlist _all { rename `var' `var'_2 } *change id_2 back to id rename id_2 id merge 1:1 id using "~/data1.dta"

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
1 like
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

25 Aug 2018, 15:25

Following in Carole's path, you can use macro extended list functions and the describe command to limit the renaming to the variables that are duplicated.

Code:

quietly describe, varlist
local var2 `r(varlist)'
quietly describe using "~/data1.dta", varlist
local var1 `r(varlist)'

local same : list var2 &amp; var1
local idlist id
local same : list same - idlist
display "same name in both datasets: `same'"
rename (`same') (=_2)

merge 1:1 id using "~/data1.dta"
rename (`same') (=_1)
list, clean

Code:

. list, clean

       income_2    id   q1   income_1   sex        _merge  
  1.       4000   hh1    1       2200     m   matched (3)  
  2.       3400   hh2    2       3000     f   matched (3)  
  3.       4100   hh3    1       3900     m   matched (3)  
  4.       2100   hh4    1       3500     m   matched (3)  
  5.       2700   hh5    2       2900     f   matched (3)

Comment

Masood Sadat

Join Date: Aug 2014

Posts: 38
#8

25 Aug 2018, 20:36

Thanks William, this is exactly what I was after. There is a typo if you want to fix in the local same : list var2 & var1
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#9

25 Aug 2018, 23:18

Thank you for pointing that out for the benefit of those who may find this topic in the future, The forum software hung on my first attempt to post, and when I let it recover what I had tried to post, it recovered my ampersand & as the html-escaped sequence we see, without my having noticed the substitution.
Comment
Ishwor Adhikari

Join Date: Feb 2016

Posts: 22
#10

07 Mar 2020, 04:05

I have somehow opposite problem to these. Survey was done in 2003 and collected 500 sample. Another survey was done in 2010 and collected 600 sample for the same household and some new. Now I have two sample files for 2003 and 2010. However, the name for each variable is different even if they capture same thing. For example, household number in 2003 is named as 'HH' while it was coded as 'v00_hno' for 2010. ID and all other variables are like similar, not following any specific pattern. In such scenario how can I append, to prepare workable panel data. How can I let Stata know these two different variable name is same while appending?

My data set for 2003 is like this

input int WWW byte HH int V00_SERL
3 1 10
3 6 63
3 11 191
3 12 127
3 14 148
3 17 180
3 18 201
4 1 3
4 4 13
4 5 16
4 7 23
4 8 26
4 10 33

My data set for 2010 is like

input int xhpsu byte(xhnum v00_hno)
3 1 1
3 6 6
3 11 11
3 12 12
3 14 14
3 17 17
3 18 11
3 19 19
3 21 21
17 1 1
17 2 2
17 3 3
17 4 4
17 7 7
17 8 8
17 10 10

How can I get something like

input int xhpsu byte(xhnum v00_hno)
3 1 1
3 1 10
3 6 6
3 6 63
3 11 11
3 11 191
3 12 12
3 12 127
3 14 14
3 14 148
3 17 17
3 17 180
3 18 11
3 18 201
3 19 19
3 21 21
4 1 3
4 4 13
4 5 16
4 7 23
4 8 26
4 10 33
17 1 1
17 2 2
17 3 3
17 4 4
17 7 7
17 8 8
17 10 10

Last edited by Ishwor Adhikari; 07 Mar 2020, 04:34.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#11

07 Mar 2020, 10:11

Code:

clear* input int xhpsu byte(xhnum v00_hno) 3 1 1 3 6 6 3 11 11 3 12 12 3 14 14 3 17 17 3 18 11 3 19 19 3 21 21 17 1 1 17 2 2 17 3 3 17 4 4 17 7 7 17 8 8 17 10 10 end tempfile 2010data save `2010data' clear input int WWW byte HH int V00_SERL 3 1 10 3 6 63 3 11 191 3 12 127 3 14 148 3 17 180 3 18 201 4 1 3 4 4 13 4 5 16 4 7 23 4 8 26 4 10 33 end tempfile 2003data save `2003data' use `2003data', clear unab 2003varnames: _all use `2010data', clear unab 2010varnames: _all rename (`2010varnames') (`2003varnames') append using `2003data'

NOTE: This assumes that the variables that mean the same thing are in the same order in the two data sets (as is the case in your example.) If they are not, then it won't be this easy.

Added: Oh, on re-reading more carefully, I see that the variables are not in the same order in both data sets. The household id HH appears second in 2003 but its corresponding v00_hno is third in the 2010 data. So the above code produces data salad. Sorry.

So, in this case it's more complicated. And since the changes in the names is unsystematic, it's going to be tedious. You have stated you want to use the 2010 variable names. So you need to compose a local macro that gives a crosswalk between the 2003 names and the corresponding 2010 names. The simplest approach to this is to just have that crosswalk list the 2010 names in the order that those variables appear in the 2003 data. So it would look like this:

Last edited by Clyde Schechter; 07 Mar 2020, 10:50.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#12

07 Mar 2020, 10:50

Code:

local crosswalk xhpsu v00_hno xhnum use `2003data', clear rename (_all) (`crosswalk') gen wave = 2003 append using `2010data' replace wave = 2010 if missing(wave)

I imagine you have more than just three variables. It will be tedious and error-prone to create the crosswalk for all of the variables, but in the absence of any systematic relationship between old and new variable name, there is no alternative.
Comment
Ishwor Adhikari

Join Date: Feb 2016

Posts: 22
#13

02 Apr 2020, 07:39

Thank you so much Clyde Schechter. Yeah given there is no systematic relationship between, even not the order, it seems creating crosswalk seems better option.
Comment

Announcement

Merge different variables with same variable name

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment