One of my coleques recently discovered the following.
We discussed how to proceed and decided to give the community a fair warning of one of the more subtle elements in Stata.
The reason for our decision is that is more a matter of design in Stata than an actual error.
Consider the dataset:
And test the -codebook-command vs the -count- command:
Why do -codebook- include both observations, whereas -count- only one?
Because codebook.ado starts like this:
And the functionality of the date()-function has been changed from Stata version 10 and onwards:
In other cases things goes well, for instance:
So depending on what version of Stata you are running and what input you use you'll get different results
It might have been better to return an error like for instance:
This becomes very problematic when combined with the practice of changing Stata-version in subroutines of different time of development - and of course the convention of considering a missing numerical value an actual (very large) number.
This problem eg also occurs for:
We did some more digging and found:
Using trace we found:
We could fear that it might be the -syntax-/-marksample- that doesn't evaluate conditions and functions therein properly.
But that we can't check.
However if that is the case then the problem could be quite widespread in Stata.
The morale of this that you have to check your conditions and Stata functions very carefully.
Verify that you actually get what you want.
That's all, folks
We discussed how to proceed and decided to give the community a fair warning of one of the more subtle elements in Stata.
The reason for our decision is that is more a matter of design in Stata than an actual error.
Consider the dataset:
Code:
clear input str6 Name str10 datestr Lena 09112008 Laila 09112013 end gen birthday = date( datestr , "DMY") drop datestr format %td birthday
Code:
. codebook Name if birthday < date("31122009","DMY") ---------------------------------------------------------------------------------------------------------------------------------------- Name (unlabeled) ---------------------------------------------------------------------------------------------------------------------------------------- type: string (str6), but longest is str5 unique values: 2 missing "": 0/2 tabulation: Freq. Value 1 "Laila" 1 "Lena" . count if birthday < date("31122009","DMY") 1
Because codebook.ado starts like this:
Code:
*! version 1.5.1 26jan2012 program codebook, rclass version 8.1, born(09sep2003) missing
Code:
. version 8: dis date("31122009","DMY") . . version 9: dis date("31122009","DMY") . . version 10: dis date("31122009","DMY") 18262 . version 11: dis date("31122009","DMY") 18262 . version 12: dis date("31122009","DMY") 18262
Code:
. version 8:dis date("31.12.2009","DMY") 18262 . version 12:dis date("31.12.2009","DMY") 18262
It might have been better to return an error like for instance:
Code:
. dis datt("31.12.2009","DMY") unknown function datt() r(133);
This problem eg also occurs for:
Code:
di =mdy(200, 121, 2001) .
Code:
. codebook Name if birthday > date("31122009","DMY") no observations r(2000);
Code:
- Codebook_vars `varlist' `if' `in' , `mv' `quiet' `notes' tabulate(`tabulate') `lnopt' = Codebook_vars Name if birthday > date("31122009","DMY") , tabulate(9) --------------------------------------------------------------------------------------------------- begin codebook.Codebook_vars --- - syntax [varlist] [if] [in] [, Tabulate(integer 9) Mv Notes quiet languages(str) ] - marksample touse, novarlist - qui count if `touse' = qui count if __000001 - if r(N) == 0 { - error 2000 no observations }
But that we can't check.
However if that is the case then the problem could be quite widespread in Stata.
The morale of this that you have to check your conditions and Stata functions very carefully.
Verify that you actually get what you want.
That's all, folks
Comment