Should I really use merge m:m ?

Zsolt Marai

Join Date: Dec 2021
Posts: 21

Should I really use merge m:m ?

16 Dec 2021, 05:24

Hi,

I know that using merge m:m is clearly not advised but I feel like I don't have any choice.

This is my master dataset:

Code:

input long idhous int(year age)
41 1999 34
41 2000 35
41 2001 36
41 2002 37
41 2003 38
41 2004 39
41 2005 40
41 2006 41
41 2007 42
41 2008 43
41 2009 44
41 2010 45
41 2011 46
41 2012 47
41 1999 31
41 2000 32
41 2001 33
41 2002 34
41 2003 35
41 2004 36
41 2005 37
41 2006 38
41 2007 39
41 2008 40
41 2009 41
41 2010 42
41 2011 43
41 2012 44
41 1999  8
41 2000  9
41 2001 10
41 2002 11
41 2003 12
41 2004 13
41 2005 14
41 2006 15
41 2007 16
41 2008 17
41 2009 18
41 2010 19
41 2011 20
41 2012 21
41 1999  6
41 2000  7
41 2001  8
41 2002  9
41 2003 10
41 2004 11
41 2005 12
41 2006 13
41 2007 14
41 2008 15
41 2009 16
41 2010 17
41 2011 18
41 2012 19
41 1999  3
41 2000  4
41 2001  5
41 2002  6
41 2003  7
41 2004  8
41 2005  9
41 2006 10
41 2007 11
41 2008 12
41 2009 13
41 2010 14
41 2011 15
41 2012 16
51 1999 38
51 2000 39
51 2001 40
51 2002 41
51 2003 42
51 2004 43
51 2005 44
51 2006 45
51 2007 46
51 2008 47
51 2009 48
51 2010 49
51 2011 50
51 2012 51
51 2013 52
51 2014 53
51 2015 54
51 2016 55
51 2017 56
51 2018 57
51 2019 58
51 2005 30
51 2006 31
51 2008 16
51 2009 17
51 2010 18
51 2011 19
51 2012 20
51 2013 21
52 2018 26

This is my using dataset :

Code:

input long idhous int(year ofs_)
 41 1999 6636
 41 2000 6636
 41 2001 6636
 41 2002 6636
 41 2003 6636
 41 2004 6636
 41 2005 6636
 41 2006 6636
 41 2007 6636
 41 2008 6636
 41 2009 6636
 41 2010 6636
 41 2011 6636
 41 2012 6636
 51 1999 6643
 51 2000 6643
 51 2001 6643
 51 2002 6643
 51 2003 6643
 51 2004 6643
 51 2005 6643
 51 2006 6643
 51 2007 6643
 51 2008 6643
 51 2009 6643
 51 2010 6643
 51 2011 6643
 51 2012 6643
 51 2013 6643
 51 2014 6643
 51 2015 6643
 51 2016 6643
 51 2017 6643
 51 2018 6643
 51 2019 6643
 52 2018 6636
 52 2019 6636
 61 1999 5591
 61 2000 5591
111 1999 6002
111 2000 6002
111 2001 6002
111 2002 6002
131 1999 6621
131 2000 6621
131 2001 6621
131 2002 6621
131 2003 6621
131 2004 6621
131 2005 6621
131 2006 6621
131 2007 5708
131 2010 5708
131 2011 5708
131 2012 5708
131 2013 5708
131 2014 5708
131 2015 5708
131 2016 5708
131 2017 5708
131 2018 5708
131 2019 5708
141 1999 5409
141 2000 5409
141 2001 5409
211 1999 6244
211 2001 6244
211 2009 5635
211 2010 5635
211 2011 5635
211 2012 5635
211 2013 5581
211 2014 5581
211 2015 5581
211 2018 5581
211 2019 5606
251 1999 5744
261 1999 6290
261 2000 6290
261 2001 6290
261 2002 6290
261 2007 6290
261 2008 6290
261 2009 6290
261 2010 6290
261 2013 6290
271 1999 5586
271 2000 5586
271 2001 5586
271 2002 5586
271 2003 5586
271 2004 5586
271 2005 5586
271 2006 5586
271 2007 5586
271 2008 5586
271 2009 5586
271 2010 5586
271 2011 5586
271 2012 5586

Is there a way to do differently?

Thank you

Tags: None

Fei Wang

Join Date: Oct 2021

Posts: 726
#2

16 Dec 2021, 05:38

Although you didn't explain the dataset, "idhous" seems to be household ID and there are multiple members in a household so household ID-year pairs are not unique. In such cases, you must have IDs for each family members. Then, household ID, household member ID, and year together are used for 1:1 merge.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 617
#3

16 Dec 2021, 05:42

Maybe a m:1 ID year will work from what I see from the example data. Otherwise, joinby is often forgotten but can be helpful, depending on what you want to achieve.

Best wishes

(Stata 16.1 MP)
Comment

Zsolt Marai

Join Date: Dec 2021
Posts: 21

16 Dec 2021, 07:38

Thanks both of you for your answers.

I already thought about the advice in #2 but I don't think I can do it because I have IDs for household members in the master file but not in the using file.

In other words, I can clearly identify the members in one file but not in the other. So, generating an id variable for household members would create misleading observations.

Code:

input long(idpers idhous) int year
4101 41 1999
4101 41 2000
4101 41 2001
4101 41 2002
4101 41 2003
4101 41 2004
4101 41 2005
4101 41 2006
4101 41 2007
4101 41 2008
4101 41 2009
4101 41 2010
4101 41 2011
4101 41 2012
4102 41 1999
4102 41 2000
4102 41 2001
4102 41 2002
4102 41 2003
4102 41 2004
4102 41 2005
4102 41 2006
4102 41 2007
4102 41 2008
4102 41 2009
4102 41 2010
4102 41 2011
4102 41 2012
4103 41 1999
4103 41 2000
4103 41 2001
4103 41 2002
4103 41 2003
4103 41 2004
4103 41 2005
4103 41 2006
4103 41 2007
4103 41 2008
4103 41 2009
4103 41 2010
4103 41 2011
4103 41 2012
4104 41 1999
4104 41 2000
4104 41 2001
4104 41 2002
4104 41 2003
4104 41 2004
4104 41 2005
4104 41 2006
4104 41 2007
4104 41 2008
4104 41 2009
4104 41 2010
4104 41 2011
4104 41 2012
4105 41 1999
4105 41 2000
4105 41 2001
4105 41 2002
4105 41 2003
4105 41 2004
4105 41 2005
4105 41 2006
4105 41 2007
4105 41 2008
4105 41 2009
4105 41 2010
4105 41 2011
4105 41 2012
5101 51 1999
5101 51 2000
5101 51 2001
5101 51 2002
5101 51 2003
5101 51 2004
5101 51 2005
5101 51 2006
5101 51 2007
5101 51 2008
5101 51 2009
5101 51 2010
5101 51 2011
5101 51 2012
5101 51 2013
5101 51 2014
5101 51 2015
5101 51 2016
5101 51 2017
5101 51 2018
5101 51 2019
5102 51 2005
5102 51 2006
5103 51 2008
5103 51 2009
5103 51 2010
5103 51 2011
5103 51 2012
5103 51 2013
5103 52 2018

Ideally, I would like to have the variable idpers in the using data file as well but the only common variable seems to be idhous.

Am I wrong, or is there a way to merge the two files without using merge m:m ?

Thank you.

Comment

Zsolt Marai

Join Date: Dec 2021

Posts: 21
#5

16 Dec 2021, 07:46

Actually, the answer given in #3 by Felix worked.

Since I'm quite new on Stata, I forgot that it is possible to put more than one variables as common variables in both dataset.

Thank you again for your help.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#6

16 Dec 2021, 08:19

I thought the using data was person level, but just realized it's household level. So Felix was right that it's an m:1 merge based on the combination of idhous and year.
Comment

Announcement

Should I really use merge m:m ?

Comment

Comment

Comment

Comment

Comment