Value labels are dictionaries attached to categorical variables that explain the meaning of the codes. The help and manual for Stata illustrate a few typical uses, though do not lay out any requirements for the labels. Specifically it is not explicitly ruled out (or I didn't find it) that the value label can be blank (empty string). This is not a very useful label, I know, but technically it is possible. But it is possible not in all versions of Stata.
[ 1 ] The following illustrates the differences in behavior of Stata 9.2 and 10.0:
The following illustrates the behavior of Stata 11.0, 12.0 and 13.0:
[ 2 ] Interestingly enough, defining the same missing value cancels it:
[ 3 ] Furthermore, adding the version 9.2 statement does not change their behavior. Even under version control they allow an empty value label to be defined.
[ 4 ] Given the possibility to define a numeric value label, Stata lacks the API to check whether such label is defined. By that I mean that I couldn't find any Stata or Mata command or function that would allow me to distinguish between labelled and unlabelled values in the following (although I can detect this situation programmatically):
Now my questions:
Q1: I can live with #1 above, but will e.g. Stata 9 be ok to find a blank value label in the file ? (It opens the file without complaints, but things can go wrong later)
Q2: Is #2 an intended behavior? And importantly, will it propagate to future versions?
Q3: Is #3 a bug? will it be fixed? is it already fixed?
Q4: Is there an immediate solution for #4? I have two rather ugly solutions: parsing the log of the label list and bruteforcing all integers, both of which look daunting.
Q5: When translating foreign files into Stata format, should such labels (empty and numeric) be dropped? What does the community think?
Q6: Similarly, when exporting data from Stata to foreign formats, should such labels be exported? Will they cause unpredictable confusion in other systems? What is your experience?
Thank you, Sergiy Radyakin
[ 1 ] The following illustrates the differences in behavior of Stata 9.2 and 10.0:
Code:
. label define mylabel 1 ""
invalid attempt to modify label
r(180);
Code:
. label define mylabel 1 "" . label list mylabel: 1
[ 2 ] Interestingly enough, defining the same missing value cancels it:
Code:
. label define mylabel 1 "" . label list mylabel mylabel: 1 . label define mylabel 1 "", modify . label list mylabel mylabel:
[ 3 ] Furthermore, adding the version 9.2 statement does not change their behavior. Even under version control they allow an empty value label to be defined.
[ 4 ] Given the possibility to define a numeric value label, Stata lacks the API to check whether such label is defined. By that I mean that I couldn't find any Stata or Mata command or function that would allow me to distinguish between labelled and unlabelled values in the following (although I can detect this situation programmatically):
Code:
sysuse auto, clear label define mylabel 1 "1" label values rep78 mylabel display `"`: label mylabel 1'"' display `"`: label mylabel 2'"' label list mylabel
Code:
. sysuse auto, clear (1978 Automobile Data) . . label define mylabel 1 "1" . . label values rep78 mylabel . . display `"`: label mylabel 1'"' 1 . . display `"`: label mylabel 2'"' 2 . . label list mylabel mylabel: 1 1
Now my questions:
Q1: I can live with #1 above, but will e.g. Stata 9 be ok to find a blank value label in the file ? (It opens the file without complaints, but things can go wrong later)
Q2: Is #2 an intended behavior? And importantly, will it propagate to future versions?
Q3: Is #3 a bug? will it be fixed? is it already fixed?
Q4: Is there an immediate solution for #4? I have two rather ugly solutions: parsing the log of the label list and bruteforcing all integers, both of which look daunting.
Q5: When translating foreign files into Stata format, should such labels (empty and numeric) be dropped? What does the community think?
Q6: Similarly, when exporting data from Stata to foreign formats, should such labels be exported? Will they cause unpredictable confusion in other systems? What is your experience?
Thank you, Sergiy Radyakin
Comment