Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging datasets

    Hi, I am trying to merge two datasets. The datasets are individual surveys. Recording the response of individuals to certain questions. The two datasets have different observations, and I am trying to merge the datasets based on a unique ID number. I was trying to create the ID number based on the country and birth year of the individual. I am struggling to do this so that I can easily match the individuals correctly when merging both datasets. Any help? Thanks

  • #2
    if country and year of birth uniquely identify individuals you can use said variables to merge the datasets. if that fails, help us help you by showing an example of the datasets you want to merge using -dataex-.

    Comment


    • #3
      Click image for larger version

Name:	2014datas.png
Views:	1
Size:	82.5 KB
ID:	1651637

      This is the 2014 data

      Click image for larger version

Name:	2016datas.png
Views:	1
Size:	69.1 KB
ID:	1651638

      This is the 2016 data
      As you can see that some individuals for country AT for example did not complete the survey in 2016. Like indidival with idno 3 completed the survey in 2014 but not in 2016. I want to then be able to merge these datasets together when I have created a unique ID that matches the individuals exactly from both datasets. If that makes sense. Thanks

      Comment


      • #4
        please post the output that is produced in the Stata results window when typing,
        Code:
        use 2014DATAstata, clear
        dataex
        use 2016DATAstata, clear
        dataex
        ​​​

        Comment


        • #5
          Hi,
          I got this for 2014:
          Click image for larger version

Name:	20142.png
Views:	1
Size:	10.3 KB
ID:	1651701

          Then for 2016:
          Click image for larger version

Name:	20162.png
Views:	1
Size:	12.9 KB
ID:	1651702

          Comment


          • #6
            actually, can you share the datasets by uploading them as attachments?

            Comment


            • #7
              Unfortunately, I cannot. Is there any way I could describe or help explain the dataset? I know it would be easier for you to have them but I can't. Sorry for the inconvenience

              Comment


              • #8
                no worries. to produce a data example, open your dataset and type dataex, copy the lines that correspond to the ones I've highlighted, and paste them in your post.

                Click image for larger version

Name:	Skjermbilde 2022-02-23 kl. 21.46.22.png
Views:	1
Size:	29.1 KB
ID:	1651715

                Comment


                • #9
                  Sorry, could you go over what I need to type? So far in the command, I opened by dataset and then types dataex. It then loaded something similar that I just posted.

                  Comment


                  • #10
                    You need a new plan. It is not possible to create an ID variable from country and birth year. Look at the first screenshot you posted. In observations 2, 6, and 8 we have three different people all from country AT and born in the same year, 1948. The combination of country and birth year simply is not enough information to distinguish different individuals in your data. There is already a variable called id in the screenshots. It appears to distinguish individuals--is there a reason you cannot use this as your identifier variable?

                    Although you do not say so, I am guessing that the two data sets represent the same, or largely the same, survey administered at two different time periods. It appears from the screenshots that they contain largely, if not entirely, the same variables. If this is correct, it would not make sense to merge these data sets in any case. Instead you will need to -append- them.

                    Next, you need to refer to the survey documentation that should have come with your data sets to determine if this is even panel data at all. That is, you need to determine whether the same people were surveyed on both occasions, or if these are simply two different samples of people. The former is panel data, the latter is not: it is serial cross-sections. It is perfectly legitimate to work with serial cross-sectional data, although the legitimate analyses are different and there are some research questions that are easily answered in panel data that cannot be answered in serial cross-sections (and, to a lesser extent, vice versa.) While you are looking at the survey documentation, if you learn that it is truly panel data, it is highly likely that the documentation will tell you which variable(s) identify individuals and are consistent across the two data sets. If there is no such information there, you should contact the people who supplied the data to you and ask them.

                    Turning now to the proper use of -dataex-, first you open the data set. Then, in the simplest case, you just type dataex in the command window. Stata will respond with a little over 100 lines of output in the Results window. Because your screen probably cannot display all of that output at once, the beginning of the output will have scrolled off the top of the screen. Scroll up until you find the line that says "copy starting from the next line". (See Oyvind Snilsberg's helpful screenshot in #8.) Because the output does not all fit on one screen, you will not, at that point, see the other line shown in that screen shot that says "copy up to and including the previous line." But you will have seen it before you scrolled up. Now select all of the lines between (but not including) "copy starting from the next line" and "copy up to and including the previous line." Copy the selected material to the clipboard, and then paste it directly into your post. Do this for both data sets.

                    Comment


                    • #11
                      They do include a variable idno, which is the interviewer respondent number. However, because they're not the same in both datasets I know it would match the individuals correctly. That then led me to choose yr born and country.

                      You're correct when saying the two datasets represent the same thing but at two different periods. One for 2014 and the other in 2016. Again, they contain the same variables. So then would I need to append the datasets?
                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input byte essround float edition long idno byte(stflife happy brncntr) int livecnta byte(facntr mocntr hhmmb gndr) int(agea yrbrn) byte(marsts eisced emplrel hinctnta) long country float ID
                      7 2.2   1 10  7 1 6666 1 1 1 1 51 1964  6  3 1  3 1   1
                      7 2.2   2  7  5 1 6666 1 1 2 1 67 1948  4  1 1  3 1   2
                      7 2.2   3 10  8 1 6666 1 1 1 2 89 1926  5  1 6  2 1   3
                      7 2.2   4  8  9 1 6666 1 1 3 1 32 1983  6  3 1  4 1   4
                      7 2.2   5 10 10 1 6666 1 1 6 2 56 1959 66  3 1  8 1   5
                      7 2.2   6 10 10 1 6666 1 1 2 2 67 1948 66  3 1  5 1   6
                      7 2.2   7  7  8 1 6666 1 1 2 1 66 1949 66  3 1  4 1   7
                      7 2.2  13  8  8 2 1978 2 2 1 2 67 1948  4  5 1  3 1   8
                      7 2.2  14  3  4 2 1989 2 2 2 2 34 1981  6  7 1  3 1   9
                      7 2.2  21  8  7 1 6666 1 1 2 2 66 1949 66  3 2  2 1  10
                      7 2.2  22  7  7 1 6666 1 1 4 1 61 1954 66  3 1  5 1  11
                      7 2.2  23 10 10 1 6666 1 1 2 2 55 1960 66  3 1  7 1  12
                      7 2.2  24 10 10 1 6666 1 1 5 2 79 1936  5  2 2  4 1  13
                      7 2.2  25  9  9 1 6666 1 1 4 2 38 1977  6  3 1  9 1  14
                      7 2.2  26  7  7 1 6666 1 1 1 1 35 1980  6  3 1  3 1  15
                      7 2.2  33  9  9 1 6666 1 1 5 2 40 1975 66  4 1 10 1  16
                      7 2.2  34 10  9 1 6666 1 1 5 1 42 1972 66  3 1  4 1  17
                      7 2.2  35 10 10 1 6666 1 1 4 2 36 1979 66  3 1  7 1  18
                      7 2.2  36 10  9 1 6666 1 1 3 2 52 1963 66  3 1  7 1  19
                      7 2.2  37 10 10 1 6666 1 1 5 1 27 1988  6  5 1  4 1  20
                      7 2.2  38  9  9 1 6666 1 1 4 2 34 1981 66  3 1  6 1  21
                      7 2.2  39  8  9 1 6666 1 1 2 2 63 1951 66  1 6  9 1  22
                      7 2.2  40 10  7 1 6666 1 1 2 2 64 1951  5  1 1 10 1  23
                      7 2.2  45  7  5 1 6666 1 1 2 1 50 1965  1  3 1  5 1  24
                      7 2.2  46  9  7 1 6666 1 1 1 2 46 1969  4  5 1  4 1  25
                      7 2.2  47 10  8 1 6666 1 1 5 1 52 1963 66  3 2  2 1  26
                      7 2.2  48 10  9 1 6666 1 1 5 2 18 1997  6  2 6 77 1  27
                      7 2.2  49 10  9 1 6666 1 1 5 2 16 1999  6  2 6 77 1  28
                      7 2.2  50  8  8 1 6666 1 1 3 1 52 1963 66  3 1  7 1  29
                      7 2.2  51  9  8 1 6666 1 1 3 1 32 1983 66  3 1  4 1  30
                      7 2.2  52  5  8 1 6666 1 1 4 1 72 1943 66  2 1  4 1  31
                      7 2.2  57  5  7 1 6666 1 2 2 1 69 1946  4  3 1  5 1  32
                      7 2.2  58  8  8 1 6666 2 2 1 2 39 1976  6  7 2 10 1  33
                      7 2.2  59  9  9 1 6666 1 1 2 2 57 1958  5  7 1  5 1  34
                      7 2.2  65  8  7 1 6666 1 1 2 2 54 1961  4  3 1  3 1  35
                      7 2.2  66 10 10 1 6666 1 1 2 2 41 1974  6  3 1  2 1  36
                      7 2.2  67  9  9 1 6666 1 2 3 1 38 1977  6  3 1  6 1  37
                      7 2.2  68  8  8 1 6666 1 1 1 2 35 1980  6  5 2 77 1  38
                      7 2.2  77  8  8 1 6666 1 1 4 2 26 1989  6  5 1  7 1  39
                      7 2.2  78 88 10 1 6666 2 2 4 2 36 1979 66  2 3 77 1  40
                      7 2.2  79  7  7 1 6666 1 1 4 1 33 1982 66  3 1  6 1  41
                      7 2.2  80  9  9 1 6666 1 1 2 2 57 1958  4  6 1  8 1  42
                      7 2.2  81  7  8 1 6666 1 1 1 1 90 1925  5  2 6  1 1  43
                      7 2.2  82  8  8 1 6666 1 1 2 1 70 1945 66  3 1  5 1  44
                      7 2.2  89  8  8 1 6666 1 1 5 2 39 1975 66  3 1  4 1  45
                      7 2.2  90  5  3 1 6666 1 1 2 1 85 1929 66  1 6  2 1  46
                      7 2.2  91  8  8 1 6666 1 1 3 1 59 1955 66  2 2  4 1  47
                      7 2.2  92  8  8 1 6666 1 1 4 2 40 1974 66  3 1 77 1  48
                      7 2.2  93 10  8 1 6666 1 1 4 2 19 1996  6  3 1  6 1  49
                      7 2.2  94 10 10 1 6666 1 1 2 2 45 1969 66  6 2  8 1  50
                      7 2.2  95  9  8 1 6666 1 1 2 2 39 1975 66  7 1  8 1  51
                      7 2.2  96  8  9 1 6666 1 1 4 1 37 1977 66  3 1  5 1  52
                      7 2.2 101  8  7 1 6666 1 1 4 1 43 1972 66  3 1  8 1  53
                      7 2.2 102  8  8 1 6666 1 1 5 2 35 1980 66  4 1  6 1  54
                      7 2.2 103  8  9 1 6666 1 1 2 2 57 1958 66  4 1  9 1  55
                      7 2.2 104  6  5 1 6666 1 1 1 1 78 1937  5 55 2  2 1  56
                      7 2.2 105  1  3 1 6666 1 1 4 1 44 1971 66  3 2  4 1  57
                      7 2.2 106 10 10 1 6666 1 1 4 2 42 1973  6  7 1  5 1  58
                      7 2.2 107  3  3 1 6666 1 2 3 2 40 1975  6  3 1  3 1  59
                      7 2.2 108  8  8 1 6666 1 1 5 2 17 1998  6  2 6 88 1  60
                      7 2.2 109  8  8 1 6666 1 1 3 2 39 1976 66  7 1  8 1  61
                      7 2.2 113  8  8 1 6666 1 1 5 1 40 1975 66  3 1  5 1  62
                      7 2.2 114  8  8 1 6666 1 1 6 1 45 1970 66  5 1  4 1  63
                      7 2.2 115  9  8 1 6666 1 1 3 1 32 1983 66  3 3  8 1  64
                      7 2.2 116  8  8 1 6666 1 1 1 2 80 1935  5  1 2  1 1  65
                      7 2.2 117  9  8 1 6666 1 1 2 2 64 1951 66  2 2  4 1  66
                      7 2.2 118  9  9 1 6666 1 1 3 1 53 1962 66  3 3 77 1  67
                      7 2.2 119  7  7 1 6666 1 1 4 1 39 1976 66  3 1  5 1  68
                      7 2.2 125  4  3 1 6666 1 2 1 2 28 1987  4  2 1  1 1  69
                      7 2.2 126  8  7 1 6666 1 1 2 2 47 1968  4  3 2 77 1  70
                      7 2.2 127 10 10 1 6666 1 1 5 2 41 1974 66  6 1 77 1  71
                      7 2.2 137  8  9 1 6666 1 1 3 1 50 1965 66  5 1  5 1  72
                      7 2.2 138  8  8 1 6666 1 1 6 2 91 1924  5  1 2  1 1  73
                      7 2.2 139  3  6 1 6666 1 1 1 1 71 1944  6  1 2  2 1  74
                      7 2.2 140 10 10 1 6666 1 1 6 1 41 1974 66  3 1  7 1  75
                      7 2.2 141  8  8 1 6666 1 1 2 1 78 1937 66  3 2  1 1  76
                      7 2.2 142  7  7 1 6666 1 1 2 1 69 1946 66  3 1  3 1  77
                      7 2.2 143 10  8 1 6666 1 1 4 2 44 1971 66  5 1  2 1  78
                      7 2.2 149  4  5 1 6666 1 1 5 2 53 1962  5  3 1  2 1  79
                      7 2.2 150  6  6 1 6666 1 1 2 2 78 1937 66  1 6  2 1  80
                      7 2.2 151  1  1 1 6666 1 1 4 1 50 1965 66  3 1  5 1  81
                      7 2.2 152  7  7 1 6666 1 1 2 2 64 1951 66  2 6  2 1  82
                      7 2.2 153  9  8 1 6666 1 1 5 1 46 1969 66  3 1  5 1  83
                      7 2.2 154  8  9 1 6666 1 1 4 2 24 1991  6  4 1  6 1  84
                      7 2.2 161  8  7 1 6666 1 1 3 1 41 1974  6  7 2  8 1  85
                      7 2.2 162  7  8 1 6666 1 1 2 2 43 1972  6  6 1  8 1  86
                      7 2.2 163  7  7 1 6666 1 1 1 1 55 1960  6  2 1  3 1  87
                      7 2.2 164  4  5 1 6666 1 1 1 2 74 1941  5  3 1  2 1  88
                      7 2.2 165  7  7 1 6666 1 1 2 1 68 1947  4  3 1  5 1  89
                      7 2.2 166  7  6 1 6666 1 1 2 2 46 1969  5  2 1  1 1  90
                      7 2.2 167  8  8 1 6666 2 1 1 2 71 1944  5  2 1  1 1  91
                      7 2.2 173  7  8 1 6666 1 1 3 2 72 1943 66  3 1  5 1  92
                      7 2.2 174  9  8 1 6666 1 1 3 1 52 1963 66  4 1  7 1  93
                      7 2.2 175  9  9 1 6666 1 1 2 2 45 1969  6  5 1  4 1  94
                      7 2.2 176 10 10 1 6666 1 1 2 1 89 1925 66  7 1 77 1  95
                      7 2.2 177  7  8 1 6666 1 1 4 1 40 1975 66  3 1  5 1  96
                      7 2.2 178  9  9 1 6666 1 1 3 2 33 1981  6  6 1  3 1  97
                      7 2.2 179  7  8 1 6666 1 1 5 1 49 1966 66  7 1  8 1  98
                      7 2.2 185 10 10 1 6666 1 1 3 1 43 1972 66  3 1  7 1  99
                      7 2.2 186  8  7 1 6666 1 1 2 2 60 1955 66  3 1 77 1 100
                      end
                      label values country country
                      label def country 1 "AT", modify

                      then for 2016 data:
                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input byte essround float edition long idno byte(stflife happy brncntr) int livecnta byte(facntr mocntr hhmmb gndr) int(agea yrbrn) byte(marsts eisced emplrel hinctnta) long country float ID
                      8 2.2   1  5  5 2 2010 2 2  1 2  34 1982  6 7 1 77 1   1
                      8 2.2   2  5  5 2 1994 2 2  2 1  52 1964 66 4 2  5 1   2
                      8 2.2   4  9  8 1 6666 1 1  1 2  68 1948  6 3 1  2 1   3
                      8 2.2   6  7  8 1 6666 1 1  1 1  54 1962  4 3 1  4 1   4
                      8 2.2  10 10  5 2 2006 2 2  5 2  20 1996  1 3 1  2 1   5
                      8 2.2  11  7  9 1 6666 1 1  2 2  65 1951  4 4 2 10 1   6
                      8 2.2  12  5  7 2 1995 2 2  4 2  52 1964 66 2 2  2 1   7
                      8 2.2  13  9  9 1 6666 1 1  4 2  44 1972 66 5 1  8 1   8
                      8 2.2  14 10  9 1 6666 1 1  1 2  22 1994  6 3 1  3 1   9
                      8 2.2  15  7  8 1 6666 1 1  3 2  41 1975  4 3 1  4 1  10
                      8 2.2  16  8  8 1 6666 1 1  1 2  57 1959  4 2 1  1 1  11
                      8 2.2  17  7  7 1 6666 1 1  2 1  61 1955 66 3 1  3 1  12
                      8 2.2  18  6  5 1 6666 1 1  1 1  50 1966  4 3 1  5 1  13
                      8 2.2  19  8  8 1 6666 1 1  3 2  31 1985  6 3 1 88 1  14
                      8 2.2  21  3  5 1 6666 1 1  2 2  58 1958  4 1 1  3 1  15
                      8 2.2  22  8  7 1 6666 1 1  1 1  28 1988  1 3 1  5 1  16
                      8 2.2  23  9  9 1 6666 1 1  3 1  58 1958 66 5 1  7 1  17
                      8 2.2  24  3  2 1 6666 1 1  1 1  51 1965  4 3 1  6 1  18
                      8 2.2  28  7  6 2 1975 2 2  2 2  65 1951 66 3 1  5 1  19
                      8 2.2  29  4  8 1 6666 1 1  2 2  61 1955  4 3 1  2 1  20
                      8 2.2  30  8  7 1 6666 1 1 88 2 999 8888  4 3 1  8 1  21
                      8 2.2  32  8  8 1 6666 1 1  4 1  47 1969 66 7 1  6 1  22
                      8 2.2  34  6  5 1 6666 1 1  2 2  40 1976  6 3 1  5 1  23
                      8 2.2  39  6  8 1 6666 1 1  3 1  45 1971 66 3 1  6 1  24
                      8 2.2  42  5  5 2 1988 2 2  5 2  28 1988 66 2 1  5 1  25
                      8 2.2  43 10 10 1 6666 1 1  1 2  46 1970  4 3 1  3 1  26
                      8 2.2  45  7  8 1 6666 1 1  2 2  24 1992  6 5 1  4 1  27
                      8 2.2  48  8  8 1 6666 1 1  1 1  80 1936  5 3 1  4 1  28
                      8 2.2  49  8  8 1 6666 1 1  4 2  46 1970 66 6 1  8 1  29
                      8 2.2  50 10  8 1 6666 1 1  2 2  55 1961  6 3 1  2 1  30
                      8 2.2  52  8  7 1 6666 1 1  3 1  61 1955 66 3 1 77 1  31
                      8 2.2  53  9  8 1 6666 1 1  1 2  39 1977  4 3 1 88 1  32
                      8 2.2  54  9  8 1 6666 1 1  3 1  42 1974 66 4 1  5 1  33
                      8 2.2  56  8  8 1 6666 1 1  2 2  65 1951 66 3 1  6 1  34
                      8 2.2  57  6  6 1 6666 1 1  2 2  77 1939 66 3 1  3 1  35
                      8 2.2  58 10 10 1 6666 1 1  4 1  18 1998  6 3 1  9 1  36
                      8 2.2  60  9  9 1 6666 1 2  3 1  56 1960 66 3 1  9 1  37
                      8 2.2  61  6  6 2 2006 2 2  5 1  23 1993  6 2 1 77 1  38
                      8 2.2  62  0 10 1 6666 1 1  4 2  52 1964 66 3 1 77 1  39
                      8 2.2  63  3  3 1 6666 1 1  2 1  55 1961 66 7 1  7 1  40
                      8 2.2  64  8  9 1 6666 1 1  3 2  75 1941  5 2 1  7 1  41
                      8 2.2  65  8  8 1 6666 1 1  1 1  23 1993  6 5 6  4 1  42
                      8 2.2  66  9  8 1 6666 1 1  2 1  76 1940 66 3 6  3 1  43
                      8 2.2  67 10  8 1 6666 1 1  3 2  59 1957 66 3 1  8 1  44
                      8 2.2  68  7  8 1 6666 1 1  2 2  49 1967 66 5 1 77 1  45
                      8 2.2  71  8  7 1 6666 1 1  2 2  70 1946 66 3 2  4 1  46
                      8 2.2  72  9  8 2 2007 2 2  4 2  32 1984 66 2 1 88 1  47
                      8 2.2  74  0  9 1 6666 1 2  2 2  64 1952  4 3 1 88 1  48
                      8 2.2  75  9  7 1 6666 1 1  2 1  31 1985 66 5 3 77 1  49
                      8 2.2  76  9  9 1 6666 1 1  1 2  26 1990  6 3 1  3 1  50
                      8 2.2  79 10  7 1 6666 1 1  2 1  65 1951 66 3 1 77 1  51
                      8 2.2  80  9  8 1 6666 1 1  2 2  23 1993  6 3 1  6 1  52
                      8 2.2  81  8  7 1 6666 1 1  2 1  70 1946 66 2 1  2 1  53
                      8 2.2  82  8  8 1 6666 1 1  3 1  58 1958 66 7 1  9 1  54
                      8 2.2  83  0  8 2 1999 2 2  2 1  38 1978 66 2 1  1 1  55
                      8 2.2  84 10 10 1 6666 2 2  2 1  19 1997 66 3 1 77 1  56
                      8 2.2  86 10 10 1 6666 2 2  2 2  61 1955 66 7 1 77 1  57
                      8 2.2  89  9  9 1 6666 2 1  1 1  61 1955  6 7 1  6 1  58
                      8 2.2  90  8  8 1 6666 1 1  2 1  74 1942 66 3 1  6 1  59
                      8 2.2  93  8  8 1 6666 1 1  2 2  28 1988  6 6 1  5 1  60
                      8 2.2  94  2  0 1 6666 1 1  1 1  49 1967  4 2 1 77 1  61
                      8 2.2  97  8  6 1 6666 1 1  3 1  42 1974  4 5 1  8 1  62
                      8 2.2 101 10 10 1 6666 1 1  1 2  21 1995  6 3 1  2 1  63
                      8 2.2 102  9  8 1 6666 1 1  2 2  27 1989  6 3 1  5 1  64
                      8 2.2 104  9  9 1 6666 1 1  2 1  66 1950 66 3 1 77 1  65
                      8 2.2 106  9  6 1 6666 1 1  2 2  65 1951 66 1 6  5 1  66
                      8 2.2 110  7  8 2 1982 2 2  4 2  49 1967 66 5 6  6 1  67
                      8 2.2 111 10 10 1 6666 1 1  4 2  18 1998  6 2 1 88 1  68
                      8 2.2 112  8  7 1 6666 1 1  3 2  55 1961  2 6 1  7 1  69
                      8 2.2 113  9  9 1 6666 1 1  2 2  55 1961 66 5 1  8 1  70
                      8 2.2 116  7  8 1 6666 1 1  2 2  68 1948 66 2 6  3 1  71
                      8 2.2 117  4  5 2 1970 2 2  1 2  62 1954  4 2 1  1 1  72
                      8 2.2 120  5  5 1 6666 1 1  1 2  73 1943  4 5 1  4 1  73
                      8 2.2 121  9  8 1 6666 1 1  4 1  48 1968 66 4 1 77 1  74
                      8 2.2 122  9  7 1 6666 1 1  1 1  27 1989  1 3 1  5 1  75
                      8 2.2 123  6  7 2 1980 2 2  4 2  74 1942  5 1 1 77 1  76
                      8 2.2 124  9  7 1 6666 1 1  1 1  23 1993  6 3 1  2 1  77
                      8 2.2 126  8  7 1 6666 1 1  2 2  52 1964 66 3 1 77 1  78
                      8 2.2 127 10  9 1 6666 1 1  1 2  58 1958  6 2 1  2 1  79
                      8 2.2 128  7  7 1 6666 1 2  3 1  23 1993  6 5 6  7 1  80
                      8 2.2 132  9  9 2 1992 2 2  5 1  35 1981 66 3 1  8 1  81
                      8 2.2 133  6  6 2 2012 2 2  1 2  35 1981  4 3 1 77 1  82
                      8 2.2 134 10  9 1 6666 1 1  7 2  31 1985 66 2 1  9 1  83
                      8 2.2 135  9  7 1 6666 1 1  1 2  24 1992  6 3 1  2 1  84
                      8 2.2 140  3  4 2 2000 2 2  1 2  39 1977  4 1 1  1 1  85
                      8 2.2 144  8 10 2 1970 2 2  1 2  77 1939  5 7 2  4 1  86
                      8 2.2 147  7  8 1 6666 1 1  1 1  34 1982  6 3 1  4 1  87
                      8 2.2 152  3  4 1 6666 1 1  2 2  63 1953  5 3 1  5 1  88
                      8 2.2 155  6  5 1 6666 1 1  1 2  81 1935  5 5 1  3 1  89
                      8 2.2 157 10 10 1 6666 1 1  3 2  67 1949 66 3 1  4 1  90
                      8 2.2 159  5  5 1 6666 1 1  4 1  49 1967 66 3 1 77 1  91
                      8 2.2 160  8  8 1 6666 1 1  1 2  60 1956  5 2 1 88 1  92
                      8 2.2 161 10 10 1 6666 1 1  3 2  51 1965 66 7 1 77 1  93
                      8 2.2 162  9  9 1 6666 1 1  5 1  49 1967 66 3 1  9 1  94
                      8 2.2 166  7  7 1 6666 1 1  4 2  46 1970 66 3 1 77 1  95
                      8 2.2 167  8  6 1 6666 1 1  1 2  85 1931  5 2 6  1 1  96
                      8 2.2 169  7  7 1 6666 2 2  5 2  19 1997  6 2 6 77 1  97
                      8 2.2 171  5  5 1 6666 1 1  2 1  68 1948 66 3 1 88 1  98
                      8 2.2 172  8  5 2 2000 2 2  1 2  65 1951  6 3 2 77 1  99
                      8 2.2 173 10 10 1 6666 1 1  3 2  38 1978 66 3 1  7 1 100
                      end
                      label values country country
                      label def country 1 "AT", modify

                      Comment


                      • #12
                        They do include a variable idno, which is the interviewer respondent number. However, because they're not the same in both datasets I know it would match the individuals correctly.
                        Say what??? Do you mean you know it would match the individuals incorrectly? What about the variable ID which appears last in your example data. Does that correctly match individuals across time periods?

                        Anyway, I think the reasoning is incorrect here. When you conduct a survey and plan to use the same people over time, you are usually not completely successful in doing that. Some people who participate the first time can't be contacted the second time, or decline to participate again, or die, or become disabled from communicating, etc. And in some surveys, the plan is to introduce new participants at each wave, to "replace" people who were lost to follow-up and maintain overall sample size. So the values of a time-consistent person identifier in the two data sets can be different. There will usually be a few people in each time period who don't appear in the other time period. That's not a problem. But, regardless of how you or I think about it, the only reliable answer to the question of how to correctly match up people across time periods comes from the people who produced the survey data sets. So, review the survey documentation carefully. If the survey covered the same people at both time periods, it should contain some variable(s) that will correctly match the people across years, and the documentation should disclose what it (they) is (are). Read it carefully. If the documentation confirms that the same people were surveyed in both periods but does not disclose how to correctly match them across years, then you will have to contact the people who provided the survey for that information.

                        You're correct when saying the two datasets represent the same thing but at two different periods. One for 2014 and the other in 2016. Again, they contain the same variables. So then would I need to append the datasets?
                        Yes. The useful way to combine these data sets is with -append-, not -merge-.

                        Comment


                        • #13
                          Sorry, that it was what I was supposed to say. Even the source of the data says, "If you want to merge data (combine variables for the same respondents) from different files of the same ESS round, you have to use the variables CNTRY (Country) and IDNO (Respondent’s identification number) as merging (“by”/”key”) variables".

                          I am not sure where to go from here.

                          Comment


                          • #14
                            In some other statistical packages, you combine waves of a survey "side by side." That involves a merge, and the key variables would be CNTRY and IDNO. But in Stata, you combine waves of a survey using -append-, not -merge-. So it would be:
                            Code:
                            use 2014_data, clear
                            gen int wave = 2014
                            append using 2016_data
                            replace wave = 2016 if missing(wave)
                            sort CNTRY IDNO wave
                            Then you can save that as a new combined file to use for your analyses. (Evidently replace 2014_data and 2016_data in the above code by the actual names of the Stata data files containing the 2014 data and the 2016 data.


                            Comment


                            • #15
                              Hi, I tried doing that but just looking at the data in Stata it still returned the match with not similar individuals. For example, it matched the two waves but the identity of the individuals matched but it wasn't correct as when I look at the year born it would show different years. For example, in 2014 idno =1 the year born was 1964 but in 2016 with the idno still being =1, the year born was 1982

                              Comment

                              Working...
                              X