Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • STATA thinks _merge variable defined even though I dropped it (so, preventing me from doing new merge command)

    Help please (new to STATA forum).

    I just upgraded to STATA 18 at my University and have problem not encountered previously. I'm doing several merge commands, and my usual approach is to "drop _merge" in between commands so that I would not get the usual error that _merge is already defined.

    I just now got the "variable _merge already defined" error even though I've dropped that variable. I confirm when I search that there is no variable "_merge" in my data set, but if I try to "drop _merge" again just to check it says "variable _merge already defined". So, it seems that STATA thinks there is a _merge variable defined, even though I cannot find and (and thought I dropped it). Because of this, I cannot move forward to do a new "merge" command.

    Any thoughts on what is happening? Like I said....never had this happen until just now when I moved to STATA 18. I've exited and restarted STATA also to see if that helps, but no luck.

    Thanks for any advice!
    David

  • #2
    Hard to comment without any reproducible examples, but you can use the generate() option to specify a different name.

    You could send in the files for a reproducible example to StataCorp technical support. From

    Code:
    which merge
    in Stata 18 I don't pick up that the merge code has been touched recently.

    Comment


    • #3
      Thanks for the reply Nick. Yes, I may send this in to StataCorp. I was not able to generate a new name or rename the _merge variable to work around either. Issue is happening on other datasets also (just tried another one).

      Comment


      • #4
        My only wild guess is that somehow the name has acquired an extra non-standard character. Look very carefully at e.g. describe output to see how the name aligns with other names in a list. Also try

        Code:
        foreach v of var *merge* { 
              di "|`v'|" 
        }

        Comment


        • #5
          Originally posted by David Dickinson View Post
          Help please (new to STATA forum).

          I just upgraded to STATA 18 at my University and have problem not encountered previously. I'm doing several merge commands, and my usual approach is to "drop _merge" in between commands so that I would not get the usual error that _merge is already defined.

          I just now got the "variable _merge already defined" error even though I've dropped that variable. I confirm when I search that there is no variable "_merge" in my data set, but if I try to "drop _merge" again just to check it says "variable _merge already defined". So, it seems that STATA thinks there is a _merge variable defined, even though I cannot find and (and thought I dropped it). Because of this, I cannot move forward to do a new "merge" command.

          Any thoughts on what is happening? Like I said....never had this happen until just now when I moved to STATA 18. I've exited and restarted STATA also to see if that helps, but no luck.

          Thanks for any advice!
          David
          Hi David,

          I just got the exact same issue with Stata/SE 18.0. I was trying to merge two datasets (none of them has _merge variable). But the error occurs saying that _merge is already defined. Did you figure out how to fix it?

          Thanks and look forward to your reply!

          Comment


          • #6
            Hi Melody,
            if you have verified that _merge is not present anywhere in either dataset, you can always change the name:
            merge 1:1 a using b, gen(mx)
            that way rather than creating a _merge, it will create a "mx" variable.
            Later you can check if indeed _merge was not present in either dataset
            HTH

            Comment


            • #7
              I have the identical problem with the zombie "_merge" variable using Stata 17.0. It is really odd. No "_merge" variable is present in the dataset. The command "drop _merge" immediately prior to the merge command errors out. Similarly, the "cap drop _merge" command immediately prior to the merge command has no effect and leads to the same merge command error. Nick's code also errors out.

              The merge does goes through using either the ", nogen" option or the ", gen(_m2)" option that Fernando mentioned. The merge then proceeds correctly, and even correctly populates the new _m2 merge variable. Interestingly, after the merge with either option, the undead _merge variable now appears! And it is populated by values that seem to correspond to a prior merge. My best guess is that it originated in a prior merge command that was issued with the ", nogen" option (rather than a prior merge that created a _merge variable that was subsequently dropped).

              Even more interestingly, the zombie is quite resilient and survived my attempt to purge it via Python by first loading the dta as a new dataframe object (df = pd.read_stata(...)), and then saving the new df dataframe object as a new Stata dta file (df.to_stata(...)). The newly saved Stata dta has the identical zombie problem. The bug could be quite low-level as the python dataframe object has no "_merge" column, and yet it is able to save a dta with the zombie variable. As far as I know, dataframes do not allow for hidden columns (unlike, say, Excel sheets) but I could be wrong. I was half expecting the pd.read_stata() to error out.

              Unfortunately, I am unable to post the data sets or the logs but I can say they are small boring data sets with tens of obs and a dozen or so variables. Thank you David for starting this thread, or I would have wasted even more time going down this rabbit hole. Nothing I have seen before in 30+ years of intensive and loving Stata use. Happy Friday and happy holidays to all.

              Comment


              • #8
                It may be possible the _merge variable it’s in the “using” dataset instead of the master dataset

                Comment


                • #9
                  Great catch Fernando! Thank you. Such a silly mistake on my part.

                  Perhaps a future version of Stata will show pity to those of us that are less talented with an expanded error message, e.g., "variable _merge already defined in using data set".

                  Comment


                  • #10
                    How would you go about dropping the _merge variable from the using dataset? I think I may be having the same issue.

                    Comment


                    • #11
                      You would need to -use- the using data set, -drop _merge-, then -save- the result, and -merge- with the -save-d file.

                      This is, evidently, cumbersome. It would be simpler to just start with what you are calling the using data set in memory and have it be the master data set, which you could do once you -drop _merge-.

                      Another approach, is to instead put the using data set into a new frame. Then -frlink- it with the frame of the master data (or the other way around: remember that -frlink- allows 1:1 and m:1 but not 1:m, so the direction of the linkage matters, unlike the direction of a -merge-.) Then you -frget _all- from the appropriate data set. Linked frames do not create a -merge- variable. But they do create a linkage variable that, by default, is the name of the linked frame, or can be specified in the -frlink- command. So -frlink- will not care if there is already a _merge variable in the using data set. But it will care if the master data set has a variable whose name clashes with the name of the linkage variable to be created.

                      Comment

                      Working...
                      X