Problem with format of variable after using destring (Stata 13.1 MP on Windows 7)

Juan Hernandez

Join Date: Oct 2016

Posts: 33
#1

Problem with format of variable after using destring (Stata 13.1 MP on Windows 7)

26 Jan 2017, 16:03

Dear Statalist, I am trying to merge a database, but the variable that identifies each individual is a very long string variable. I am trying to use the following code for getting it to be a numeric variable, but it always approximates the number with and exponential, so of course it will tell me that "ID_COMPLETO2" doesn't identify uniquely observations in my data.

This is the code I am using:

destring ID_COMPLETO, gen(ID_COMPLETO_n2)
format %50.0g ID_COMPLETO_n2

But I am getting the following results:

Does anybody have an idea of how to solve this problem?

Best regards.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

26 Jan 2017, 16:56

Well, you have two problems here, one of which you recognize, and the other you haven't stumbled on yet.

You might apply a different format to ID_COMPLETO_n2 that will show you something a bit more informative (e.g. -format ID_COMPLETO_n2 %20.16e-). But the bigger problem is that your original variable contains two many digits to be stored without loss of precision in any data storage type available in Stata. Stata's largest data storage type is the double, which occupies 8 bytes. These are good for about 16 digits of accuracy. Your variable is way beyond that. So even if you apply a format that makes Stata exhibit all 16 of the digits it is capable of holding, you will find that all of the values of the numeric version are the same--because the original variables all agree in the highest order 16 digits.

So this approach is just not viable. The question I have is why do you even want to do this? I cannot conceive of any real numeric variable that could be assessed with anything close to this level of precision. These very long numbers look more like some kind of ID number. If that is the case, for most purposes it is best to just leave them as strings. If the problem is that you need to use these identifiers as, say, a panel identifiers in an -xt- command, then you should use

Code:

egen numeric_id = group(ID_COMPLETEO)

That will give you consecutive numbers 1, 2, ... distinguishing the different values of ID_COMPLETEO, and you will have no problem with using it as the panel identifier for -xtset- (or as a level identifier in a multi-level command).

Added: In the future please do not use screenshots to show examples of data. It is impossible to import that into Stata, short of hand-typing it all in. Had it been possible to import the data, there are several things I could have shown you that are relevant. But I'm not going to spend the time and effort needed to type in this data. The way to show data examples that makes them useful to those who want to help you is with the -dataex- command. You install it by running -ssc install dataex-. Read -help dataex- for the simple instructions for using it. Then use it all the time to post example data. It enables those who respond to you to create a completely faithful replica of your example data with just a simple copy paste operation, and then they can use that to try out code solutions to your problems or show alternative approaches.

Last edited by Clyde Schechter; 26 Jan 2017, 16:59.
2 likes
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#3

26 Jan 2017, 17:07

Hi Juan,

I get the same formatting problem - i think it may be something do with the limits of the double variable that is created on the fly from the destring() command.

Here's a short input if someone else wants to have a stab

Code:

clear input str50 ID_COMPLETO 1700000005001000088888888880000000008888888 end destring ID_COMPLETO, gen(ID_COMPLETO_n2) format ID_COMPLETO_n2 %43.0f

That said, merging on strings is perfectly fine - you might want to try this. And the error message you receive is nothing to do with the format of your ID_COMPLETO_n2 variable; merge changes variable formats as necessary within numeric or string categories to accommodate values from the using dataset. If you get the same error after attempting to merge on the string ID_COMPLETO, then the issue is exactly what it tells you - as such you will want to try either a 1:m, m:1, or m:m merge and not a 1:1.

Chris
Comment

Announcement

Problem with format of variable after using destring (Stata 13.1 MP on Windows 7)

Comment

Comment