Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting with _n

    Hello Statalist,

    Question here about counting with _n in large datasets. I am attempting to create a row identifier variable which simply takes the value of the row number. In the past I've used the command
    Code:
    gen row_id = _n
    . This works fine for the first ~16.5 million rows. After this point, however, the values in "row_id" do not match the row number! For example, here's what the output looks like:

    True Row row_id
    16961210 16961210
    16961211 16961212
    16961212 16961212
    16961213 16961212
    16961214 16961214
    16961215 16961216
    16961216 16961216
    16961217 16961216

    As is apparent, "row_id" begins a cycle of correct then incorrect numbers, returning to the correct value every so often before deviating again. I've tried looping through the integers 1-17million to manually create "row_id." Again, I run into the same problem. I'd include a full example here, but it seems too large to post.

    Can anyone provide insight on what I'm doing wrong here?

    I'm running StataMP 17.0 on Mac OS.

    Thanks in advance!

    Andy
    Last edited by Andy Myers; 10 Dec 2021, 17:21.

  • #2
    This is a precision problem. When you command -gen row_id = _n-, by default Stata creates row_id as a float variable. A float variable only has enough precision to hold 7 digits of accuracy, but you need 8.

    The solution is to use a storage type with greater precision. A long integer will get you up to 9 digits. If you need even more you have to use doubles, which are good up to 16 digits. (Beyond that, you're stuck and a more complicated solution is needed.)

    So
    Code:
    gen long row_id = _n // IF ALL ARE WITHIN 9 DIGITS
    
    gen double row_id = _n // IF YOU NEED BETWEEN 10 AND 16 DIGITS
    Recommended reading: -help precision- and -help datatypes-

    Comment


    • #3
      Clyde responded as I was writing and I agree with his response. You can demonstrate the issue by running the following code:

      Code:
      clear
      set obs 3
      gen x1=16961217
      gen double x2=16961217
      format %20.0fc x1 x2
      list

      Comment


      • #4
        Marvelous. That worked perfectly. Thank you Clyde and Alan for your help!

        Comment

        Working...
        X