Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge failing to merge data despite reporting matched cases

    Hi all,

    I am using Stata.13.
    I am merging a data set that has 186 observations and 56 variables with a Master data set that has 3667 observations. The 56, and 186 variables from the using file are in the master data set. I am matching records using the following
    Code:
     merge 1:1 student_id chd_sex chd_b_date using "/Volumes/CPHHS/4_StudyData/2 School/4 Pedometer Data/Fall 2013/6_Bonanza/BZE_PAA20150306.dta"
    This is the Stata output after I run my code.
    HTML Code:
    . merge 1:1 student_id chd_sex chd_b_date using "/Volumes/CPHHS/4_StudyData/2 School/4 Pedometer Data/Fal
    > l 2013/6_Bonanza/BZE_PAA20150306.dta"
    
        Result                           # of obs.
        -----------------------------------------
        not matched                         3,481
            from master                     3,481  (_merge==1)
            from using                          0  (_merge==2)
    
        matched                               186  (_merge==3)
        -----------------------------------------
    This tells me that the matching according to Stata is happening. But when I look at the variables in the new matched file for the specific cases all those variables have cells reporting missing
    .
    below is an example of data from the using file ped_id_pa1, pa_date1, pa_1, and grade_pa1 are supposed to merge in the merged file.
    HTML Code:
    ped_id_pa1 student_id         pa_date1   pa_1    grade_pa1
    p465            BZE_0001       10/7/2013    1          1
    p144            BZE_0002       10/7/2013    1          1
    p108            BZE_0003       10/7/2013    1          1
    p343            BZE_0004       10/7/2013    1          1
    p290            BZE_0009       10/7/2013    1          1
    p430            BZE_0010       10/7/2013    1          1
    p473            BZE_0012       10/7/2013    1          1
    p034            BZE_0014       10/7/2013    4          1
    p147            BZE_0015       10/7/2013    1          1
    but this is how they turn out for those same cases
    HTML Code:
    ped_id_pa1 student_id         pa_date1   pa_1    grade_pa1
    .                  BZE_0001       .         .            .
    .                  BZE_0002       .         .            .
    .                  BZE_0003       .         .            .
    .                  BZE_0004       .         .            .
    .                  BZE_0009       .         .            .
    .                  BZE_0010       .         .            .
    .                  BZE_0012       .         .            .
    .                  BZE_0014       .         .            .
    .                  BZE_0015       .         .            .
    I have used similarly merge in the past and nothing like this has happened before, any suggestions?

    Patrick

  • #2
    are you sure you are looking at the correct observations? should have _merge=3; the observations with _merge=1 will show exactly what you are showing us

    Comment


    • #3
      Yes I am positive see below I listed some cases for the same variables after the merge process and used the if statement to limit merge==3 and merge==1 and the corresponding school that should merge.
      HTML Code:
      . list student_id pa_1 pa_date1 ped_id_pa1 grade_pa1 _merge if _merge==3 & school==6
      
            +----------------------------------------------------------------+
            | studen~d   pa_1   pa_date1   ped_id~1   grad~pa1        _merge |
            |----------------------------------------------------------------|
         1. | BZE_0001      .          .                     .   matched (3) |
         2. | BZE_0002      .          .                     .   matched (3) |
         3. | BZE_0003      .          .                     .   matched (3) |
         4. | BZE_0004      .          .                     .   matched (3) |
         9. | BZE_0009      .          .                     .   matched (3) |
            |----------------------------------------------------------------|
        10. | BZE_0010      .          .                     .   matched (3) |
        12. | BZE_0012      .          .                     .   matched (3) |
        14. | BZE_0014      .          .                     .   matched (3) |
        15. | BZE_0015      .          .                     .   matched (3) |
        16. | BZE_0016      .          .                     .   matched (3) |
      
      *this is to confirm that all cases of merge==3 are from school 6*
      . fre school _merge if _merge==3
      
      school -- school
      -----------------------------------------------------------
                    |      Freq.    Percent      Valid       Cum.
      --------------+--------------------------------------------
      Valid   6 BZE |        186     100.00     100.00     100.00
      -----------------------------------------------------------
      
      _merge
      -------------------------------------------------------------------
                            |      Freq.    Percent      Valid       Cum.
      ----------------------+--------------------------------------------
      Valid   3 matched (3) |        186     100.00     100.00     100.00
      -------------------------------------------------------------------
      *this is to confirm that all cases of merge==1 are from all schools and without the 186 cases of school 6, Note the 168 cases have no data for the 56 variables that I am trying to merge*
      . fre school _merge if _merge==1
      
      school -- school
      -----------------------------------------------------------
                    |      Freq.    Percent      Valid       Cum.
      --------------+--------------------------------------------
      Valid   1 HPE |        785      22.55      22.55      22.55
              2 CKE |        547      15.71      15.71      38.26
              3 MES |        731      21.00      21.00      59.26
              4 CRE |        941      27.03      27.03      86.30
              5 CQE |        309       8.88       8.88      95.17
              6 BZE |        168       4.83       4.83     100.00
              Total |       3481     100.00     100.00           
      -----------------------------------------------------------
      
      _merge
      -----------------------------------------------------------------------
                                |      Freq.    Percent      Valid       Cum.
      --------------------------+--------------------------------------------
      Valid   1 master only (1) |       3481     100.00     100.00     100.00
      -----------------------------------------------------------------------

      Comment


      • #4

        You did a 1:1 merge, and then you cannot get more matches (_merge==3) than the number of observations (186) in the smaller dataset. Try an m:1 merge.

        Comment


        • #5
          Hi Svend,

          I tried your suggestion and still nothing
          HTML Code:
          . merge m:1 student_id chd_sex chd_b_date using "/Volumes/CPHHS/4_StudyData/2 School/4 Pedometer Data/Fal
          > l 2013/6_Bonanza/BZE_PAA20150306.dta"
          
              Result                           # of obs.
              -----------------------------------------
              not matched                         3,481
                  from master                     3,481  (_merge==1)
                  from using                          0  (_merge==2)
          
              matched                               186  (_merge==3)
              -----------------------------------------
          . list student_id pa_1 pa_date1 ped_id_pa1 grade_pa1 _merge if _merge==3 & school==6
          
                +----------------------------------------------------------------+
                | studen~d   pa_1   pa_date1   ped_id~1   grad~pa1        _merge |
                |----------------------------------------------------------------|
             1. | BZE_0001      .          .                     .   matched (3) |
             2. | BZE_0002      .          .                     .   matched (3) |
             3. | BZE_0003      .          .                     .   matched (3) |
             4. | BZE_0004      .          .                     .   matched (3) |
             9. | BZE_0009      .          .                     .   matched (3) |
                |----------------------------------------------------------------|
            10. | BZE_0010      .          .                     .   matched (3) |
            12. | BZE_0012      .          .                     .   matched (3) |
            14. | BZE_0014      .          .                     .   matched (3) |
            15. | BZE_0015      .          .                     .   matched (3) |
            16. | BZE_0016      .          .                     .   matched (3) |
          
          *this the data that supposedly should have matched*
          . use "/Volumes/CPHHS/4_StudyData/2 School/4 Pedometer Data/Fall 2013/6_Bonanza/BZE_PAA20150306.dta", cle
          > ar
          
          . do "/var/folders/jm/09q0bdrn4rx0_z64txpj22gm0000gn/T//SD09627.000000"
          
          . list student_id pa_1 pa_date1 ped_id_pa1 grade_pa1
          
               +---------------------------------------------------+
               | studen~d   pa_1    pa_date1   ped_id~1   grade_~1 |
               |---------------------------------------------------|
            1. | BZE_0001      1   10/7/2013       p465          1 |
            2. | BZE_0002      1   10/7/2013       p144          1 |
            3. | BZE_0003      1   10/7/2013       p108          1 |
            4. | BZE_0004      1   10/7/2013       p343          1 |
            5. | BZE_0009      1   10/7/2013       p290          1 |
               |---------------------------------------------------|
            6. | BZE_0010      1   10/7/2013       p430          1 |
            7. | BZE_0012      1   10/7/2013       p473          1 |
            8. | BZE_0014      4   10/7/2013       p034          1 |
            9. | BZE_0015      1   10/7/2013       p147          1 |
           10. | BZE_0016      1   10/7/2013       p377          1 |

          Comment


          • #6
            If the master had multiple observations per merge key variables, then merge 1:1 would have thrown an error. With merge m:1 the constraints are relaxed in that Stata will not check if the key variables uniquely identify observations in the master. In other words, the merge is the same.

            There are two possibilities for matching observations (i.e. -merge == 3).
            1. The using dataset has missing values for these variables.
            2. The master dataset has variables with the same name as those listed above and their values are missing.
            I'm a bit concerned with the following line

            Code:
            . do "/var/folders/jm/09q0bdrn4rx0_z64txpj22gm0000gn/T//SD09627.000000"
            I can't think of any situation where it makes sense to execute Stata code from a temporary file.
            Last edited by Robert Picard; 06 Mar 2015, 13:10.

            Comment


            • #7
              Originally posted by Robert Picard View Post
              I can't think of any situation where it makes sense to execute Stata code from a temporary file.
              If you run sections of a dofile from the dofile editor Stata creates a temp file and runs that. I don't think this is a cause for concern.


              If the master dataset has the variables already you should look at the update and replace options for merge.
              Last edited by Sarah Edgington; 06 Mar 2015, 15:43. Reason: added advice to original poster

              Comment


              • #8
                I guess I just outed myself as one who does not use Stata's do-file editor. Thanks Sarah for pointing this out.

                Comment


                • #9
                  Hi Sarah,

                  You were right I needed to use replace and update options in merge. It worked I appreciate everyones feedback. Below is the code I used and the output I received!
                  HTML Code:
                  . merge 1:1 student_id chd_sex chd_b_date using "/Users/Patrizio/Desktop/BZE_PAA20150306.dta", replace update
                  
                      Result                           # of obs.
                      -----------------------------------------
                      not matched                         3,481
                          from master                     3,481  (_merge==1)
                          from using                          0  (_merge==2)
                  
                      matched                               186
                          not updated                         0  (_merge==3)
                          missing updated                   186  (_merge==4)
                          nonmissing conflict                 0  (_merge==5)
                      -----------------------------------------

                  Comment

                  Working...
                  X