Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing values using predict after reg

    Hello,

    I am using Stata/MP 17.0. I am working on an iterative OLS for ordered qualitative responses, along the lines of this paper. After running one of my regression models, I am trying to generate the predicted value for the two middle categories of an ordered categorical satisfaction variable. For those two values "lowsathat" and "hisathat", the results are all missing values. However, when troubleshooting and using the same predict command for thw highest and lowest categories of the same variable, they produce results. Can someone help me understand why the two middle values are not generating predicted values?

    [CODE]
    . gen sat_fin_con1 = 0 if satisfact_fin_con == 1
    (11,635 missing values generated)

    . replace sat_fin_con1 = .438 if satisfact_fin_con == 2
    (3,434 real changes made)

    . replace sat_fin_con1 = .517 if satisfact_fin_con == 3
    (5,911 real changes made)

    . replace sat_fin_con1 = 1 if satisfact_fin_con == 4
    (2,290 real changes made)

    reg sat_fin_con1 mcaid_exp eng_d eng_d_mexp gender i.ethnicity i.education c.income2012##c.income2012 c.income2015##c.income2015 c.income2018##c.income2018 i.living_arr_and_dep_kids full_time_employment unemployed retirement i.state h1 hi_under55 Y2012 Y2015 Y2018
    note: Y2018 omitted because of collinearity.


    Source | SS df MS Number of obs = 14,304
    -------------+---------------------------------- F(78, 14225) = 44.95
    Model | 246.835092 78 3.16455246 Prob > F = 0.0000
    Residual | 1001.45071 14,225 .070400753 R-squared = 0.1977
    -------------+---------------------------------- Adj R-squared = 0.1933
    Total | 1248.2858 14,303 .087274404 Root MSE = .26533


    *reg results omitted for space and formatting

    . predict lowsathat if sat_fin_con1 == .438 & e(sample)==1
    (option xb assumed; fitted values)
    (14,304 missing values generated)

    . predict hisathat if sat_fin_con == .517 & e(sample) == 1
    (option xb assumed; fitted values)
    (14,304 missing values generated)

    ​​​​​​. predict vlowsathat if sat_fin_con1 == 0 & e(sample)==1
    (option xb assumed; fitted values)
    (11,635 missing values generated)

    ​​​​​​. predict vhisathat if sat_fin_con1 == 1 & e(sample) == 1
    (option xb assumed; fitted values)
    (12,014 missing values generated)
    [/CODE}


  • #2
    This is a precision problem; it has nothing to do with -predict-. Testing for, or conditioning on exact equality of floating point quantities usually ends badly, as it has here.

    The problem is that when you run
    Code:
    replace sat_fin_con1 = .438 if satisfact_fin_con == 2
    the actual number stored in sat_fin_con1 is not 0.438. That's because all quantities in Stata are represented in binary, and there is no finite binary representation of the number 0.438 (just as there is no finite decimal representation of the fraction 1/3.) The actual number will be different: perhaps it is 0.438000002... or 0.43799999996... or something like that, some infinite binary string, and it will be truncated to the number of bits that fit within the storage assigned to variable sat_fin_con1.

    Now, because your original -gen sat_fin_con1 = 0...- command did not specify storage type, Stata by default stores it as a float, a 4-byte storage type. But when you come to -
    if sat_fin_con1 == .438- in your -predict- command, Stata, by default in calculations and operations (including operators like ==) uses double precision, an 8-byte type. So in executing the -if- clause, Stata takes the number in sat_fin_con1, which is as close to .438 as is mathematically possible within 4 bytes and has to expand it to 8 bytes--which it does by packing zeroes onto the end because Stata has no memory of what the bits farther out that got truncated originally were. But on the right hand side of the == operator, it calculates the number which is as close to 0.438 as mathematically possible within 8 bytes. Well, these numbers will differ because the four lowest order bytes on the left side are all zeroes, but the ones on the right side are not.

    As you might surmise, the same is true for .517. But with 0 and 1, there is no problem because these are integers and are exactly representable as binary numbers in float storage type. And, in fact, for these numbers, the lowest order bytes are all zero whether the result of packing a float, or direct calculation.

    So the moral of the story is that you should never rely on exact equality of floating point numbers. Given that your variable sat_fin_con1 takes on only 4 distinct values that are fairly widely spaced, you can easily solve your problem by:

    Code:
    predict lowsathat if inrange(sat_fin_con1,0.4, 0.45) & e(sample)==1
    and an analogous change in the the subsequent command targeting sat_fin_con1 at .517.

    Another approach that will probably work here given the limited amount of calculation being done is to go back to -
    gen sat_fin_con1 = 0 if satisfact_fin_con == 1-
    and change that to:
    Code:
    gen double sat_fin_con1 = 0 if satisfact_fin_con == 1
    With sat_fin_con1 created as a double, the subsequent -if sat_fin_con1 == .438- will compare double to double with no zero packing and it should work correctly. But I don't advise this approach in general, because if it were something like -if x + y == .438-, even if x and y were both doubles, there could be some rounding error introduced by the addition operation that would lead to a lack of absolute equality. The safest practice is simply to never test for or condition on exact equality of floating point quantities.

    Added: Here's a very simplified version of what is going wrong with your code:

    Code:
    . clear
    
    . set obs 1
    Number of observations (_N) was 0, now 1.
    
    . gen double xdbl = 0.438
    
    . gen xflt = 0.438
    
    .
    . assert xdbl == 0.438
    
    . assert xflt == 0.438
    assertion is false
    r(9);
    Last edited by Clyde Schechter; 03 Mar 2024, 17:41.

    Comment


    • #3
      Thank you! That is very helpful.

      Comment

      Working...
      X