Exporting variable names, labels, and value codes for labels

yaqoob lums

Join Date: Jul 2019

Posts: 12
#1

Exporting variable names, labels, and value codes for labels

28 Sep 2020, 00:05

Hi all,

I am trying to prepare a codebook document that provides details on all variables (including their labels, value codes etc). Similar topic was discussed earlier, here. My query is slightly different as I want the value codes exported as well, and I want to export it to word. I want the output to have the following columns:

Serial No Var name Var label value codes for label

1 a1 gender 1=male; 2=female

I know `describe, replace` gives me variables, variable labels, and names of value labels, but could we possible tweak it to get the value codes for the labels too (as in 1=male/2=female in the example above)? Or is there any other command that can do this?

Any help is appreciated. Thanks!
PS: I have stata MP 14.1
Tags: None
Attaullah Shah

Join Date: Aug 2014

Posts: 1667
#2

28 Sep 2020, 01:31

As you have noted, describe can do part of the job and asdoc (from SSC) can export a neat table to MS word. This post might be of help
https://www.statalist.org/forums/for...k-to-text-file

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
yaqoob lums

Join Date: Jul 2019

Posts: 12
#3

28 Sep 2020, 01:58

Thanks, Attaulllah! asdoc is a really nice program. I have been using it for some time now.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#4

28 Sep 2020, 04:25

Short answer: Not easily, the tricky part is the value labels.

Longer answer: There are two strategies you could follow.
You could extract the values for a variable present in a dataset using levelsof and loop over those values and extract the corresponding value label using the extended macro function label (see: help extended_fcn)

You could use the Mata function st_vlload() to create two matrices in Mata: one containing the values and the other the corresponding labels. You can loop over these to create a string like the one you want and move that to Stata.

Option 1 has the advantage that you can keep it all in Stata and you don't have to learn Mata, but you will miss values that have labels but are not in the data. This could happen when a question allows for a certain (rare) outcome, but in the sample that outcome happened not to occur.
Option 2 solves that, but now only shows labeled values and not show unlabeled ones

You can combine both strategies, see inside the code from htmlcb from SSC if you are interested in that.

One thing you should think about is what to do with variables with many labeled values, e.g. a 4-digit classification of occupations or disease, where you could potentially get upto \(10^4\) distinct values. You'd probably want to move the labels for that type of variable to an appendix, rather than put it in the main table.

Last edited by Maarten Buis; 28 Sep 2020, 04:41.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3822
#5

28 Sep 2020, 04:56

elabel (SSC), with the list subcommand, gives you the value labels using the second approach that Maarten has mentioned.

I would like to add the basic idea of a third approach without spelling out the details. Note that uselabel will create datasets holding value label names, values, and labels. You could merge this information to datasets created by describe (or by another command) using the value label name as an identifier.
Comment
yaqoob lums

Join Date: Jul 2019

Posts: 12
#6

29 Sep 2020, 01:05

Maarten and Daniel,
Thank you for taking the time out to respond to the query. All the suggestions (excluding the one using mata which I couldn't do given my limited stata skills) worked for me.

For reference of people who might be interested:

Method 1: This gives detailed codebook table for each variable in a large excel file. However, as pointed out by Maarten, only those labels are reported whose frequency >=1.

ds, has(vallabel) //here you can follow ds with a varlist.
fre using myvalues.xls, replace combine

Method 2: This give you an excel or dta file with details of value labels, value codes and variable names for all or a subset of the variables, but not codebook (frequencies/ percentages etc).

preserve

uselabel
save labels.dta

restore

preserve

desc, replace
ren vallab lname // to ensure name of the identifier variable is consistent in the two datasets
keep if lname!="" //keep variables that have value labels
merge 1:m lname using labels.dta
save myfile.dta //you can specify xlsx if need be

restore

Best,
Yaqoob Ali
[email protected]
Comment

Serial No	Var name	Var label	value codes for label
1	a1	gender	1=male; 2=female

Announcement

Exporting variable names, labels, and value codes for labels

Comment

Comment

Comment

Comment

Comment