Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Should I really use merge m:m ?

    Hi,

    I know that using merge m:m is clearly not advised but I feel like I don't have any choice.

    This is my master dataset:

    Code:
    input long idhous int(year age)
    41 1999 34
    41 2000 35
    41 2001 36
    41 2002 37
    41 2003 38
    41 2004 39
    41 2005 40
    41 2006 41
    41 2007 42
    41 2008 43
    41 2009 44
    41 2010 45
    41 2011 46
    41 2012 47
    41 1999 31
    41 2000 32
    41 2001 33
    41 2002 34
    41 2003 35
    41 2004 36
    41 2005 37
    41 2006 38
    41 2007 39
    41 2008 40
    41 2009 41
    41 2010 42
    41 2011 43
    41 2012 44
    41 1999  8
    41 2000  9
    41 2001 10
    41 2002 11
    41 2003 12
    41 2004 13
    41 2005 14
    41 2006 15
    41 2007 16
    41 2008 17
    41 2009 18
    41 2010 19
    41 2011 20
    41 2012 21
    41 1999  6
    41 2000  7
    41 2001  8
    41 2002  9
    41 2003 10
    41 2004 11
    41 2005 12
    41 2006 13
    41 2007 14
    41 2008 15
    41 2009 16
    41 2010 17
    41 2011 18
    41 2012 19
    41 1999  3
    41 2000  4
    41 2001  5
    41 2002  6
    41 2003  7
    41 2004  8
    41 2005  9
    41 2006 10
    41 2007 11
    41 2008 12
    41 2009 13
    41 2010 14
    41 2011 15
    41 2012 16
    51 1999 38
    51 2000 39
    51 2001 40
    51 2002 41
    51 2003 42
    51 2004 43
    51 2005 44
    51 2006 45
    51 2007 46
    51 2008 47
    51 2009 48
    51 2010 49
    51 2011 50
    51 2012 51
    51 2013 52
    51 2014 53
    51 2015 54
    51 2016 55
    51 2017 56
    51 2018 57
    51 2019 58
    51 2005 30
    51 2006 31
    51 2008 16
    51 2009 17
    51 2010 18
    51 2011 19
    51 2012 20
    51 2013 21
    52 2018 26

    This is my using dataset :
    Code:
    input long idhous int(year ofs_)
     41 1999 6636
     41 2000 6636
     41 2001 6636
     41 2002 6636
     41 2003 6636
     41 2004 6636
     41 2005 6636
     41 2006 6636
     41 2007 6636
     41 2008 6636
     41 2009 6636
     41 2010 6636
     41 2011 6636
     41 2012 6636
     51 1999 6643
     51 2000 6643
     51 2001 6643
     51 2002 6643
     51 2003 6643
     51 2004 6643
     51 2005 6643
     51 2006 6643
     51 2007 6643
     51 2008 6643
     51 2009 6643
     51 2010 6643
     51 2011 6643
     51 2012 6643
     51 2013 6643
     51 2014 6643
     51 2015 6643
     51 2016 6643
     51 2017 6643
     51 2018 6643
     51 2019 6643
     52 2018 6636
     52 2019 6636
     61 1999 5591
     61 2000 5591
    111 1999 6002
    111 2000 6002
    111 2001 6002
    111 2002 6002
    131 1999 6621
    131 2000 6621
    131 2001 6621
    131 2002 6621
    131 2003 6621
    131 2004 6621
    131 2005 6621
    131 2006 6621
    131 2007 5708
    131 2010 5708
    131 2011 5708
    131 2012 5708
    131 2013 5708
    131 2014 5708
    131 2015 5708
    131 2016 5708
    131 2017 5708
    131 2018 5708
    131 2019 5708
    141 1999 5409
    141 2000 5409
    141 2001 5409
    211 1999 6244
    211 2001 6244
    211 2009 5635
    211 2010 5635
    211 2011 5635
    211 2012 5635
    211 2013 5581
    211 2014 5581
    211 2015 5581
    211 2018 5581
    211 2019 5606
    251 1999 5744
    261 1999 6290
    261 2000 6290
    261 2001 6290
    261 2002 6290
    261 2007 6290
    261 2008 6290
    261 2009 6290
    261 2010 6290
    261 2013 6290
    271 1999 5586
    271 2000 5586
    271 2001 5586
    271 2002 5586
    271 2003 5586
    271 2004 5586
    271 2005 5586
    271 2006 5586
    271 2007 5586
    271 2008 5586
    271 2009 5586
    271 2010 5586
    271 2011 5586
    271 2012 5586
    Is there a way to do differently?

    Thank you

  • #2
    Although you didn't explain the dataset, "idhous" seems to be household ID and there are multiple members in a household so household ID-year pairs are not unique. In such cases, you must have IDs for each family members. Then, household ID, household member ID, and year together are used for 1:1 merge.

    Comment


    • #3
      Maybe a m:1 ID year will work from what I see from the example data. Otherwise, joinby is often forgotten but can be helpful, depending on what you want to achieve.
      Best wishes

      (Stata 16.1 MP)

      Comment


      • #4
        Thanks both of you for your answers.

        I already thought about the advice in #2 but I don't think I can do it because I have IDs for household members in the master file but not in the using file.

        In other words, I can clearly identify the members in one file but not in the other. So, generating an id variable for household members would create misleading observations.

        Code:
        input long(idpers idhous) int year
        4101 41 1999
        4101 41 2000
        4101 41 2001
        4101 41 2002
        4101 41 2003
        4101 41 2004
        4101 41 2005
        4101 41 2006
        4101 41 2007
        4101 41 2008
        4101 41 2009
        4101 41 2010
        4101 41 2011
        4101 41 2012
        4102 41 1999
        4102 41 2000
        4102 41 2001
        4102 41 2002
        4102 41 2003
        4102 41 2004
        4102 41 2005
        4102 41 2006
        4102 41 2007
        4102 41 2008
        4102 41 2009
        4102 41 2010
        4102 41 2011
        4102 41 2012
        4103 41 1999
        4103 41 2000
        4103 41 2001
        4103 41 2002
        4103 41 2003
        4103 41 2004
        4103 41 2005
        4103 41 2006
        4103 41 2007
        4103 41 2008
        4103 41 2009
        4103 41 2010
        4103 41 2011
        4103 41 2012
        4104 41 1999
        4104 41 2000
        4104 41 2001
        4104 41 2002
        4104 41 2003
        4104 41 2004
        4104 41 2005
        4104 41 2006
        4104 41 2007
        4104 41 2008
        4104 41 2009
        4104 41 2010
        4104 41 2011
        4104 41 2012
        4105 41 1999
        4105 41 2000
        4105 41 2001
        4105 41 2002
        4105 41 2003
        4105 41 2004
        4105 41 2005
        4105 41 2006
        4105 41 2007
        4105 41 2008
        4105 41 2009
        4105 41 2010
        4105 41 2011
        4105 41 2012
        5101 51 1999
        5101 51 2000
        5101 51 2001
        5101 51 2002
        5101 51 2003
        5101 51 2004
        5101 51 2005
        5101 51 2006
        5101 51 2007
        5101 51 2008
        5101 51 2009
        5101 51 2010
        5101 51 2011
        5101 51 2012
        5101 51 2013
        5101 51 2014
        5101 51 2015
        5101 51 2016
        5101 51 2017
        5101 51 2018
        5101 51 2019
        5102 51 2005
        5102 51 2006
        5103 51 2008
        5103 51 2009
        5103 51 2010
        5103 51 2011
        5103 51 2012
        5103 51 2013
        5103 52 2018
        Ideally, I would like to have the variable idpers in the using data file as well but the only common variable seems to be idhous.

        Am I wrong, or is there a way to merge the two files without using merge m:m ?

        Thank you.

        Comment


        • #5
          Actually, the answer given in #3 by Felix worked.

          Since I'm quite new on Stata, I forgot that it is possible to put more than one variables as common variables in both dataset.

          Thank you again for your help.

          Comment


          • #6
            I thought the using data was person level, but just realized it's household level. So Felix was right that it's an m:1 merge based on the combination of idhous and year.

            Comment

            Working...
            X