Why does manual code for multiple kdensity plots not work?

Samuel R. Lucas

Join Date: Apr 2014

Posts: 16
#1

Why does manual code for multiple kdensity plots not work?

24 Jul 2017, 11:45

I wrote the following, which matches (for my problem) the stata 13 Base Reference (page 1008) example code for multiple kdensity plots on one graph:

Code:

kdensity crate, nograph generate(xyall fx) ; kdensity crate if ques== .7, generate(xy07 fx07) at(xyall) ; kdensity crate if ques== .8, generate(xy08 fx08) at(xyall) ; kdensity crate if ques== .9, generate(xy09 fx09) at(xyall) ; kdensity crate if ques== 1, generate(xy10 fx10) at(xyall) ; label var fx07 "ques07" ; label var fx08 "ques08" ; label var fx09 "ques09" ; label var fx10 "ques10" ; line fx07 fx08 fx09 fx10 xyall, sort ytitle(Density) ;

What I get is only a graph for ques==1. If I change the order of the kdensity commands, I still only get a graph for ques==1. I checked and there are over 900,000 observations where ques==.7, over a million where ques==.8, over a million where ques==.9, and over a million where ques==1. So, its not that somehow there are zero cases with ques==.7, .8, or .9.

I also ran the code on two different PCs with different amounts of RAM, and on a clustered computer with oodles of memory. I dropped all other variables from the dataset to minimize memory usage (even if only by a few bytes). I also ran all kdensity graphs with the "nograph" option. I also ran lots of other checks. I ran the commands as a histogram--no luck there, either. I ran only two values of ques. No luck there, either. I searched the forum and google and found other kdensity questions, some of whose answers called for looping, which seems like it would be time-cost prohibitive with my 7 million case dataset.

What's up? Any assistance is greatly appreciated.

Take care.
Sam
Tags: None
Samuel R. Lucas

Join Date: Apr 2014

Posts: 16
#2

24 Jul 2017, 13:33

In case anyone ever wants to know, it seems the problem is that one cannot use "ques07==.*". I suppose there must be a leading zero. What I did is multiply ques7 by 10, thereby getting rid of the decimal values, and re-running it with the new variable in the if subcommand. Its an odd error but its probably documented somewhere. Not sure I would've found it, though. On the other hand, I'm glad its something simple like this and not a problem with memory or something like that.

So, that works.

Take care.
Sam
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#3

24 Jul 2017, 13:46

This is indeed a precision problem.

Code:

search precision

points to various documentation. I'd start with the blog posts by William Gould.

Code:

. clear . set obs 1 number of observations (_N) was 0, now 1 . gen foo = .7 . list if foo == .7 . list if foo == float(.7) +-----+ | foo | |-----| 1. | .7 | +-----+

shows one remedy.
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#4

24 Jul 2017, 13:47

Moreover, there's an old programming rule, going back to the FORTRAN days, that I recall as "Never compare floating point numbers for strict equality." By paying attention to precision, you generally can make it work, but sooner or later, it's likely to bite you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#5

24 Jul 2017, 13:49

This is a general problem, in no way specific to -kdensity-. Any time you condition on the equality of floating point numbers you are looking for trouble, and most of the time you will find it.

Just as there is no exact finite representation for 1/3 in decimal notation, there is no exact finite representation for m/n in base 2 unless n is a power of 2. So numbers like 0.7, 0.8, and 0.9 are represented by the closest available approximation in binary. You can sometimes finesse this problem by relaxing the -if- condition to -if float(ques) == float(0.7)- etc. But this will not always solve the problem, depending on how the variable ques was created: it may not even agree exactly with the float-level truncation of 0.7 out to that level: there could have been rounding errors greater than that along the way.

The bullet-proof solution is to do as you report in #2, multiplying by 10 (or by 100 if we have two decimal places, etc.) and then conditioning on integer level equality. All integers have exact finite representations in binary, and any that do not exceed the maximum size that can fit in a -long- can be used in Stata.

The entire matter is explained in greater detail in -help precision-. Even the most experienced users trip over this from time to time, because it can show up almost anywhere!

Added: Crossed with Nick and Mike's responses, which make similar points.
Comment

Announcement

Why does manual code for multiple kdensity plots not work?

Comment

Comment

Comment

Comment