Is there an alternative to gen and egen that will save as double if the number is large?

James Park

Join Date: May 2017

Posts: 97
#1

Is there an alternative to gen and egen that will save as double if the number is large?

11 Mar 2019, 14:20

Yesterday I realized that I have to set type to double before doing "gen" for large numeric variable. If I didn't realize this and published my research, it would have been really damaging.

Because research accuracy is the absolutely priority, I am now doing

set type double, permanently

for my codes.

But this will make my data to become too big, which is already too big. If there a command (as an alternative to gen and egen) that will save as double if the numeric is larger than, say, 5 digits?
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

11 Mar 2019, 14:42

I think the short answer is No, insofar as what is advertised is just the ability to set storage type as double if you so wish, either on the fly or "permanently".

Being worried about reproducibility is understandable but your results will be just as reproducible if you don't do this so long as your original dataset and full code are accessible. I would be genuinely interested in identification of a research finding that didn't stand up under examination because one researcher used double and another used float.

In any case "5 digits" isn't clear to me as a criterion. How many digits does 12345.6789 have? Do you have a decimal criterion for what you want that matches how calculations are done in binary?

Last edited by Nick Cox; 11 Mar 2019, 14:44.
3 likes
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

11 Mar 2019, 15:44

William Gould (StataCorp's president and Stata's creator) has written extensively in the Stata Blog on the subject of precision. It's worth reading his thoughts at the link below.

https://blog.stata.com/2012/04/02/th...-to-precision/

Let me add the following information about the limits on storage of decimal integers with full accuracy in the various numeric storage types. The fixed-point variables lose the 27 largest positive values to missing value codes; the similar loss for floating point variables occurs only for the largest exponent, so it doesn't affect the much smaller integer values.

byte - 7 bits	-127	100
int - 15 bits	-32,767	32,740
long - 31 bits	-2,147,483,647	2,147,483,620
float - 24 bits	-16,777,216	16,777,216
double - 53 bits	-9,007,199,254,740,992	9,007,199,254,740,992

Comment

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#4

12 Mar 2019, 14:54

Note that you can create variables double and then run compress to save space (without losing precision).
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

12 Mar 2019, 15:12

Note that compress will not reduce a double to a float - only to a long, int, or byte, meaning the content must be, in the terminology of post #3, decimal integers.

Unfortunately post #1 gives no information about the data in question, and I find it difficult to understand how inaccuracy in the 8th digit of precision would have been damaging to the research, unless it involved subtracting two numbers that are identical in the first 7 digits, or, in particular, storage of Stata datetime values.

William Gould has much to say about "false precision" in the FAQ linked to from post #3. I can't resist copying it here.

5.5 (False precision.) Double precision is 536,870,912 times more accurate than float precision. You may worry that float precision is inadequate to accurately record your data.

Little in this world is measured to a relative accuracy of ±2^-24, the accuracy provided by float precision.

Ms. Smith, it is reported, made $112,293 this year. Do you believe that is recorded to an accuracy of ±2^-24*112,293, or approximately ±0.7 cents?

David was born on 21jan1952, so on 27mar2012 he was 21,981 days old, or 60.18 years old. Recorded in float precision, the precision is ±60.18*2^-24, or roughly ±1.89 minutes.

Joe reported that he drives 12,234 miles per year. Do you believe that Joe’s report is accurate to ±12,234*2^-24, equivalent to ±3.85 feet?

A sample of 102,400 people reported that they drove, in total, 1,252,761,600 miles last year. Is that accurate to ±74.7 miles (float precision)? If it is, each of them is reporting with an accuracy of roughly ±3.85 feet.

The distance from the Earth to the moon is often reported as 384,401 kilometers. Recorded as a float, the precision is ±384,401*2^-24, or ±23 meters, or ±0.023 kilometers. Because the number was not reported as 384,401.000, one would assume float precision would be accurate to record that result. In fact, float precision is more than sufficiently accurate to record the distance because the distance from the Earth to the moon varies from 356,400 to 406,700 kilometers, some 50,300 kilometers. The distance would have been better reported as 384,401 ±25,150 kilometers. At best, the measurement 384,401 has relative accuracy of ±0.033 (it is accurate to roughly two digits).

Nonetheless, a few things have been measured with more than float accuracy, and they stand out as crowning accomplishments of mankind. Use double as required.
Comment
James Park

Join Date: May 2017

Posts: 97
#6

18 Mar 2019, 18:30

Thank you very much for everyone. I learned a lot.
Comment

Announcement