Thanks to Kit Baum as usual, a new package numdate is available from
SSC. Use
to install.
Stata version 12 is required (but see below for a note for any people on
version 10 or 11 who may be interested).
numdate is for generating numeric date-time variables. Even if you are
already fluent with functions for Stata dates and times, it offers some
convenient features. If you are a new Stata user, or an old Stata user
who only uses Stata dates and times occasionally, it may offer a simpler
introduction to them. For whatever reasons, Stata dates and times are
often found puzzling by users, as witness many questions in this forum.
(The puzzlement does diminish if you follow the old-fashioned advice of
reading the documentation slowly and carefully.)
Use of numdate does requires some minimal understanding of how Stata
holds dates and times. Here's the essence if you need it; otherwise skip
the next paragraph.
With the exception of calendar years, Stata records dates and date-times
with origin 0 as the start of 1960. For example, for daily dates 0 is 1
January 1960 and 42 is 12 February 1960; for monthly dates 0 is January
1960 and 42 is July 1963; for quarterly dates 0 is the first quarter of
1960 and 42 is the third quarter of 1970. Using numeric variables to
hold dates makes it very easy to sort observations in date order and to
calculate differences between dates. Using what is admittedly an
arbitrary convention is not a real problem for tables or graphs or other
output, as dates can and should be assigned display formats that make
sense. But given Stata's convention of origin at the start of 1960, it
is often necessary to map variables containing date or time data in
other form to Stata's numeric date-time variables.
The syntax of numdate is
numdate datetimetype ndatevar = varlist [if] [in],
pattern(pattern) [ format(format) dryrun topyear(topyear) ]
Examples might be
First, you have to specify which datetimetype you require; that must
be one of
clock
Clock
daily
date (a synonym for daily)
weekly
monthly
quarterly
halfyearly or
yearly
(where any abbreviation is allowed, down to single letters)
or alternatively one of
tc
tC
td
tw
tm
tq
th or
ty.
Then you specify the name of a new variable to hold the date-time
information in Stata form, followed by an equals sign, followed by a
varlist, one or more existing variables that contain the date-time
information in some form.
numdate ties together in one command the task of creating a Stata
date-time variable from a single variable and the task of creating one
such from several variables. Also, as specified to numdate, the varlist
may be string or numeric. For example, it could be a single string
variable containing values indicating daily dates like "2015Mar28" or
quarterly dates like "2015Q2". Or, it could be a single numeric variable
containing integers such as 20150328 which are to be parsed (in this
case) as daily dates. Or, it could be two or more variables, say three
variables indicating day, month, and year for daily dates.
You must specify a pattern indicating the order of date-time information
to the pattern() option. numdate can't figure out otherwise whether
20150706 is 6 July or 7 June 2015 or indeed something quite different:
if you have dates like that, you must spell out whether you have YMD or
YDM (or ...). (Nor can numdate help you directly if your dates have
inconsistent patterns in different observations!)
The syntax of pattern() will be familiar to users of functions like
clock() or daily() or monthly() or quarterly(), because it is precisely
that syntax. But it is extended so that it applies to numeric as well as
string variables, and to multiple variables specifying date-time
components as well as to single variables. If you have three variables
indicating day, month and year, then your pattern is DMY, just as if you
had a string variable with values like "28032015" or a numeric variable
with values like 28032015.
Other features of numdate are:
* By default it applies a minimal appropriate display format. Hitherto
you needed to follow the generation of (say) a daily date with a call to
the format command. The default format at least gets you started: daily
date values shown like 28mar2015 mean something, whereas such values
shown like 20175 mean little, unless you have exceptional powers of
mental arithmetic. A format() option lets you go beyond the default.
* It automatically generates tc (clock) and tC (Clock) variables as
double, as is necessary to maintain precision. Since version 10, the
help files have shown stern and repeated warnings to specify double when
generating date-times, warnings that many users have not read or not
understood or not believed or not remembered, and in any case have just
ignored. numdate is smart on your behalf despite your best inclinations
to be dumb. Given the command-subcommand syntax it knows precisely when
you want a clock or Clock variable and so specifies double in either
instance.
* It also allows users a dry run. The option dryrun indicates that
results of the conversion should be shown without generating a new
variable. Results are listed to show at most no more than 5 non-missing
values of the implied date variable, and no more than 20 missing values,
depending on which condition is satisfied first. This dry run should
allow the user to check assumptions about the structure of values of
varlist and/or to see the results of a particular format, whether
default or specified.
(So what about users of version 10 or 11? The code works fine in those
versions. The problem is that the help files are organised differently
from those since version 12. Further, some SMCL directives that work in
Stata 12 up will not work in 10 or 11. Anyone who downloaded the files
from SSC, edited the version statement in the ado file and edited the
help files would get a serviceable variant on numdate if they did that
correctly, but that's your responsibility.)
SSC. Use
Code:
ssc inst numdate
Stata version 12 is required (but see below for a note for any people on
version 10 or 11 who may be interested).
numdate is for generating numeric date-time variables. Even if you are
already fluent with functions for Stata dates and times, it offers some
convenient features. If you are a new Stata user, or an old Stata user
who only uses Stata dates and times occasionally, it may offer a simpler
introduction to them. For whatever reasons, Stata dates and times are
often found puzzling by users, as witness many questions in this forum.
(The puzzlement does diminish if you follow the old-fashioned advice of
reading the documentation slowly and carefully.)
Use of numdate does requires some minimal understanding of how Stata
holds dates and times. Here's the essence if you need it; otherwise skip
the next paragraph.
With the exception of calendar years, Stata records dates and date-times
with origin 0 as the start of 1960. For example, for daily dates 0 is 1
January 1960 and 42 is 12 February 1960; for monthly dates 0 is January
1960 and 42 is July 1963; for quarterly dates 0 is the first quarter of
1960 and 42 is the third quarter of 1970. Using numeric variables to
hold dates makes it very easy to sort observations in date order and to
calculate differences between dates. Using what is admittedly an
arbitrary convention is not a real problem for tables or graphs or other
output, as dates can and should be assigned display formats that make
sense. But given Stata's convention of origin at the start of 1960, it
is often necessary to map variables containing date or time data in
other form to Stata's numeric date-time variables.
The syntax of numdate is
numdate datetimetype ndatevar = varlist [if] [in],
pattern(pattern) [ format(format) dryrun topyear(topyear) ]
Examples might be
Code:
numdate daily mydate = strdate, pattern(MDY)
numdate q mydate = year quarter, pattern(YQ)
numdate tc mydate = datetime, pattern(YMD hms)
be one of
clock
Clock
daily
date (a synonym for daily)
weekly
monthly
quarterly
halfyearly or
yearly
(where any abbreviation is allowed, down to single letters)
or alternatively one of
tc
tC
td
tw
tm
tq
th or
ty.
Then you specify the name of a new variable to hold the date-time
information in Stata form, followed by an equals sign, followed by a
varlist, one or more existing variables that contain the date-time
information in some form.
numdate ties together in one command the task of creating a Stata
date-time variable from a single variable and the task of creating one
such from several variables. Also, as specified to numdate, the varlist
may be string or numeric. For example, it could be a single string
variable containing values indicating daily dates like "2015Mar28" or
quarterly dates like "2015Q2". Or, it could be a single numeric variable
containing integers such as 20150328 which are to be parsed (in this
case) as daily dates. Or, it could be two or more variables, say three
variables indicating day, month, and year for daily dates.
You must specify a pattern indicating the order of date-time information
to the pattern() option. numdate can't figure out otherwise whether
20150706 is 6 July or 7 June 2015 or indeed something quite different:
if you have dates like that, you must spell out whether you have YMD or
YDM (or ...). (Nor can numdate help you directly if your dates have
inconsistent patterns in different observations!)
The syntax of pattern() will be familiar to users of functions like
clock() or daily() or monthly() or quarterly(), because it is precisely
that syntax. But it is extended so that it applies to numeric as well as
string variables, and to multiple variables specifying date-time
components as well as to single variables. If you have three variables
indicating day, month and year, then your pattern is DMY, just as if you
had a string variable with values like "28032015" or a numeric variable
with values like 28032015.
Other features of numdate are:
* By default it applies a minimal appropriate display format. Hitherto
you needed to follow the generation of (say) a daily date with a call to
the format command. The default format at least gets you started: daily
date values shown like 28mar2015 mean something, whereas such values
shown like 20175 mean little, unless you have exceptional powers of
mental arithmetic. A format() option lets you go beyond the default.
* It automatically generates tc (clock) and tC (Clock) variables as
double, as is necessary to maintain precision. Since version 10, the
help files have shown stern and repeated warnings to specify double when
generating date-times, warnings that many users have not read or not
understood or not believed or not remembered, and in any case have just
ignored. numdate is smart on your behalf despite your best inclinations
to be dumb. Given the command-subcommand syntax it knows precisely when
you want a clock or Clock variable and so specifies double in either
instance.
* It also allows users a dry run. The option dryrun indicates that
results of the conversion should be shown without generating a new
variable. Results are listed to show at most no more than 5 non-missing
values of the implied date variable, and no more than 20 missing values,
depending on which condition is satisfied first. This dry run should
allow the user to check assumptions about the structure of values of
varlist and/or to see the results of a particular format, whether
default or specified.
(So what about users of version 10 or 11? The code works fine in those
versions. The problem is that the help files are organised differently
from those since version 12. Further, some SMCL directives that work in
Stata 12 up will not work in 10 or 11. Anyone who downloaded the files
from SSC, edited the version statement in the ado file and edited the
help files would get a serviceable variant on numdate if they did that
correctly, but that's your responsibility.)
Comment