Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues Merging Two Reports - Variable Types Different?

    Hi All,

    Thanks for the help in advance. I'm running a STATA code that has long been in use, and every calendar year I just update some dates. Have never had an issue. Unfortunately, now when I'm merging two .csv files on the same variable - an id variable called "rifisid" I get this error message:

    . merge m:1 rifisid fccpclosedate using "Tidy675_2021.dta", gen(merge675)
    key variable rifisid is long in master but str8 in using data
    Each key variable -- the variables on which observations are matched -- must be of the same generic type in the master and using datasets. Same generic type means both numeric or both
    string.


    How do I change the type of variable? What does this even mean? I see people run into this with dates on this forum, but don't see anything about regular variables. Thank you!

    Jane

  • #2
    The contents of a variable can be stored as either a numeric variable, one where Stata knows to interpret that content as a number, or as simply a "string" of alpabetic or numeric characters, without paying attention to whether it is numeric or not. This is true even for strings of numerals, such as "19432165." In a CSV file, if that string of numbers is enclosed in quotes, Stata will read and interpret it as simply a string of characters *even if those characters are numerals,* and store it as a string variable and not as a numeric variable. The way in which a variable is stored and interpreted is known in computer programming as its "data type." Variables with different data types cannot be matched or compared unless they are made to be of the same type. This issue is not unique to Stata, but in fact is present in all programs, but possibly hidden from the user.

    Stata is telling you that in one file, Stata has stored the key variable as a string, and in the other as a numeric variable of type long. (long is a numeric type variable that holds only integers, but which can hold very large "long" ones. See -help datatype-.) It might be that before now, the variable rifisid had always been put into the CSV file as a string, but someone now prepared that file with rifisid as a numeric variable. That sort of mistake is pretty common in the data processing world.
    Stata provides functions to create a numeric variable from a string variable containing numerals. See -help destring-. You can take your "using" file, and convert the string version of rifisid to a long variable, save the file, and then do the merge.
    Code:
    use "Tidy675_2021.dta"
    destring rifisid, gen(temp)
    rename rifisid rifisid_as_string // keep copy just in case
    rename temp rifisid
    recast long rifisid
    save "Tidy675_2021.dta"
    clear
    use "YourMasterFile.dta"
    merge m:1 rifisid ....
    Dates are a specific and complicated case here, where there are many different ways to use a string to represent a date, and many different ways to store the numeric information of a date. That's a difficult topic, best left aside for the moment.
    Last edited by Mike Lacy; 27 May 2021, 15:20. Reason: Forgot to illustrate "clear"

    Comment


    • #3
      P.S. I see from your profile that you identify as a SAS user. My vague recollection from some time ago is that exactly the same issues with conflicting data types prevail in SAS, although I suppose that it might try to deal with them behind the scenes for the user.

      Comment


      • #4
        HI Mike,

        Thank you so much for that clear explanation! As you mention, yes, this happens all the time with my datasets as the data process can make numeric variables "string" variables, or as we call them in SAS, character variables. Thank you!

        I copied and pasted what you had, and entered in my surrounding dataset names (see below), but I get this error now. If I'm following what you're doing, somehow it's not reading the temp name? Any further advice?

        Error:
        rename temp rifisid
        variable temp not found
        r(111);


        Code Entered:
        save "Tidy675_2021.dta", replace

        use "Tidy675_2021.dta"
        destring rifisid, gen(temp)
        rename rifisid rifisid_as_string // keep copy just in case
        rename temp rifisid
        recast long rifisid
        save "Tidy675_2021.dta"
        clear

        use "QA2021fmt.dta", clear

        Comment


        • #5
          Hi again Mike, and any others!!

          Figured out this is actually happening:

          . destring rifisid, gen(num1)
          rifisid: contains nonnumeric characters; no generate


          Any ideas??

          Comment


          • #6
            OK, that's very important to tell us. Diagnosis is usually not possible unless you show exactly what you did and *exactly* how Stata responded. Stata told you that it did not generate (create) your num1 (or temp) variable, so it's no surprise that it couldn't rename a nonexistent variable. Stata's error message meant what it said in this case.
            Stata's default behavior with -destring- (see -help destring-) is that if the string variable contains non-numeric characters, it will not create the new variable. This default can be overridden, but first you need to know why this non-numeric characters occurred.

            This "failure" might well arise from some simple data errors, or from missing information for rifisid. To see if this is true, a good place to start here is to look at the observations for which rifisid contains non-nummeric character, which you can do with:
            Code:
            browse rifisid if real(rifisid) == .
            Translation: Browse those observations for which an attempt to convert rifisid into a real number fails (due to non-numeric characters) and therefore returns a missing value.
            Last edited by Mike Lacy; 28 May 2021, 12:23.

            Comment


            • #7
              Alright mike! I shall try that Tuesday morning....I'm also going to reach out to my IT group. It's so odd because I've bene running this report on sort of auto pilot for years with no issue.

              Comment

              Working...
              X