Alternatives to m:m merge?

jmorales

Join Date: Jan 2015

Posts: 7
#1

Alternatives to m:m merge?

29 Jan 2015, 13:16

Statalist,

I'm trying to do a merge between two datasets that have different characteristics for the same company. One dataset has the company id and the production by product, and the other dataset has the company id and the share of its total operations by city. I want to create a resulting dataset that prorates the production by product to each city according to the city shares. I thought that this would be a good chance to use a m:m merge, but it's giving me bad results. What other approach would you use to deal with this situation?

Thanks,

Jose
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3824
#2

29 Jan 2015, 13:29

I have difficulties to follow the verbal description. Do you have a small example dataset, illustrating what you now have and what you are trying to achieve. You might want to reshape one or both of the datasets, before you apply merge, but this is a vague guess.

Best
Daniel
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#3

29 Jan 2015, 13:32

Using an m:m merge is almost always a really bad idea. It is almost guaranteed to end badly; the best one can hope for is that it is obvious that it has ended badly and you don't actually end up taking the results seriously!

You don't describe your data well enough to give you better advice. As I understand what you've written, you have two data sets, each keyed by company. So I don't get why you don't do a -merge 1:1 company_id...-. Maybe each data set has multiple observations per company_id. In that case, perhaps those observations are in turn identified by something distinctive that will indicate which observation in the other data set should match with it. Maybe a year variable or a city variable? -merge 1:1 company_id year...- or -merge 1:1 company_id city...-

Or maybe what you really want is to pair up each observation for a given company in the first data set with every observation for the same company in the other data set. In that case you want a different command: -joinby company_id...-
Comment

jmorales

Join Date: Jan 2015
Posts: 7

29 Jan 2015, 13:39

Thanks for your soon reply Daniel. In a dummy setting, I have these two datasets:

1) Company to Product dataset:

Company_ID	Product_ID	Value
1	Apples	600
1	Oranges	300

2) Company to City dataset:

Company_ID	City_ID	Share
1	NY	0.3
1	BOS	0.33
1	LA	0.36

3) From these two datasets, I want to prorate production of each product using the distribution by city of each company's operations.

Company_ID	Product_ID	City_ID	Value	Share	Prorated_Value
1	Apples	NY	600	0.3	180
1	Apples	BOS	600	0.33	198
1	Apples	LA	600	0.36	216
1	Oranges	NY	300	0.3	90
1	Oranges	BOS	300	0.33	99
1	Oranges	LA	300	0.36	108

I thought a many to many merge could work but it hasn't. I'll try the reshape approach.

Jose

Comment

daniel klein

Join Date: Mar 2014

Posts: 3824
#5

29 Jan 2015, 13:54

Clyde has pointed to joinby.

Best
Daniel
Comment
jmorales

Join Date: Jan 2015

Posts: 7
#6

29 Jan 2015, 14:06

Worked perfectly! Thanks guys.

Jose
Comment

Announcement

Alternatives to m:m merge?

Comment

Comment

Comment

Comment

Comment