Observation count using _n is wrong

Brent Lutes

Join Date: Sep 2014

Posts: 8
#1

Observation count using _n is wrong

19 May 2022, 08:28

I'm using a dataset with 75 million observation and '_n' gives me the wrong observation numbers once the count exceeds several million. I'm using Stata 17.0 MP (updated today) on a Windows 10 64bit system. The observation count is correct for the first 16,777,215 observations, but it is off beyond that. See screenshot from data editor below. `V1' is a string variable from the original data set. The other two variables were generated as gen obs_num = _n and gen count = 1 ; replace count = count[_n - 1] + 1 if _n > 1. Note, the problem is not related to the particular data set I'm using; it exists when I create a dummy data set as well. Any ideas what the issue is or how to fix it?
Tags: Observation count, System bug, _n
FernandoRios

Join Date: Apr 2014

Posts: 2430
#2

19 May 2022, 08:34

Not a bug
rather a precision problem.
try
gen double count = _n

HTH
2 likes
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#3

19 May 2022, 08:36

More generally,

Code:

generate `c(obs_t)' newvar = _n
2 likes
Comment
Brent Lutes

Join Date: Sep 2014

Posts: 8
#4

19 May 2022, 09:30

Thanks Fernando and Daniel, that resolves the issue
Comment

Announcement