Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is the 256th value of a Stata byte?

    This question is for my own curiosity rather than any practical purpose.

    If I understand correctly, Stata understands unsigned bytes (%1bu), which take values from 0 to 255, signed bytes (%1bs), which take values from -127 to 127, and "Stata bytes" (%1b), which take values from -127 to 100, plus 27 distinct missing codes. But there appear to be only 255 distinct values for the signed byte (and Stata byte). So what is the 256th value? Is it the value that would be represented by negative zero? And if so, how does Stata handle this value? Is it readable? Is it writable? I don't think Stata has the concept of NaN or negative zero, and I'm not sure what else it could be.

    Alternatively, maybe %1bs uses two's complement, so the unaccounted-for value is the value that would be represented by -128?

    I know that the floating point types are IEEE 754 compliant, so they must be able to store negative zero, but I don't think the byte types are required to do that by any standard.

  • #2
    See help dta.

    Comment


    • #3
      I don't think that answers my question, but it helps narrow down the unaccounted-for value.

      0x00 to 0x64 : 0 to 100
      0x65 to 0x7f : . to .z
      0x80 to 0xfe : -127 to -1

      Hence, by omission:

      0xff : ?

      But now I'm extra confused, because when I directly manipulate the dta with a hex editor, 0xff is loaded as -1, 0x81 is -127, and 0x80 is… -128. Yes, -128. In a byte. How? What?

      Comment


      • #4
        You are right. No idea what is going on here.

        Comment


        • #5
          Here's an attached DTA if anyone wants to play with it. It has a byte variable "weird" that goes from 0x00 to 0xff, including -128 at 0x80.

          Code:
          . codebook weird,  all
          
                         Dataset: ~/Temp/weird.dta
                      Last saved: 8 Feb 2024 10:27
          
                           Label: [none]
             Number of variables: 1
          Number of observations: 256
                            Size: 256 bytes ignoring labels, etc.
          
          ---------------------------------------------------------------------------
          weird                                                           (unlabeled)
          ---------------------------------------------------------------------------
          
                            Type: Numeric (byte)
          
                           Range: [-128,100]                    Units: 1
                   Unique values: 229                       Missing .: 1/256
                 Unique mv codes: 27                       Missing .*: 26/256
          
                            Mean:     -14
                       Std. dev.: 66.2508
          
                     Percentiles:     10%       25%       50%       75%       90%
                                     -106       -71       -14        43        78
          Attached Files

          Comment


          • #6
            Stata tech support replied that the documentation does need to be changed. I am guessing that they will correct the "0x80 to 0xfe" range to "0x81 to 0xff"

            I did not push them to answer what 0x80 is supposed to represent, if anything, and they did not offer an answer. I am guessing that 0x80 is officially out of spec/undefined, leaving 255 possible values for a byte. Perhaps the updated documentation will give a hint.

            Comment

            Working...
            X