Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exporting variable names, labels, and value codes for labels

    Hi all,

    I am trying to prepare a codebook document that provides details on all variables (including their labels, value codes etc). Similar topic was discussed earlier, here. My query is slightly different as I want the value codes exported as well, and I want to export it to word. I want the output to have the following columns:
    Serial No Var name Var label value codes for label
    1 a1 gender 1=male; 2=female

    I know `describe, replace` gives me variables, variable labels, and names of value labels, but could we possible tweak it to get the value codes for the labels too (as in 1=male/2=female in the example above)? Or is there any other command that can do this?

    Any help is appreciated. Thanks!
    PS: I have stata MP 14.1

  • #2
    As you have noted, describe can do part of the job and asdoc (from SSC) can export a neat table to MS word. This post might be of help
    https://www.statalist.org/forums/for...k-to-text-file
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #3
      Thanks, Attaulllah! asdoc is a really nice program. I have been using it for some time now.

      Comment


      • #4
        Short answer: Not easily, the tricky part is the value labels.

        Longer answer: There are two strategies you could follow.
        1. You could extract the values for a variable present in a dataset using levelsof and loop over those values and extract the corresponding value label using the extended macro function label (see: help extended_fcn)
        2. You could use the Mata function st_vlload() to create two matrices in Mata: one containing the values and the other the corresponding labels. You can loop over these to create a string like the one you want and move that to Stata.
        Option 1 has the advantage that you can keep it all in Stata and you don't have to learn Mata, but you will miss values that have labels but are not in the data. This could happen when a question allows for a certain (rare) outcome, but in the sample that outcome happened not to occur.
        Option 2 solves that, but now only shows labeled values and not show unlabeled ones

        You can combine both strategies, see inside the code from htmlcb from SSC if you are interested in that.

        One thing you should think about is what to do with variables with many labeled values, e.g. a 4-digit classification of occupations or disease, where you could potentially get upto \(10^4\) distinct values. You'd probably want to move the labels for that type of variable to an appendix, rather than put it in the main table.
        Last edited by Maarten Buis; 28 Sep 2020, 04:41.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          elabel (SSC), with the list subcommand, gives you the value labels using the second approach that Maarten has mentioned.

          I would like to add the basic idea of a third approach without spelling out the details. Note that uselabel will create datasets holding value label names, values, and labels. You could merge this information to datasets created by describe (or by another command) using the value label name as an identifier.

          Comment


          • #6
            Maarten and Daniel,
            Thank you for taking the time out to respond to the query. All the suggestions (excluding the one using mata which I couldn't do given my limited stata skills) worked for me.

            For reference of people who might be interested:

            Method 1: This gives detailed codebook table for each variable in a large excel file. However, as pointed out by Maarten, only those labels are reported whose frequency >=1.

            ds, has(vallabel) //here you can follow ds with a varlist.
            fre using myvalues.xls, replace combine


            Method 2: This give you an excel or dta file with details of value labels, value codes and variable names for all or a subset of the variables, but not codebook (frequencies/ percentages etc).

            preserve

            uselabel
            save labels.dta

            restore

            preserve

            desc, replace
            ren vallab lname // to ensure name of the identifier variable is consistent in the two datasets
            keep if lname!="" //keep variables that have value labels
            merge 1:m lname using labels.dta
            save myfile.dta //you can specify xlsx if need be

            restore

            Best,
            Yaqoob Ali
            [email protected]

            Comment

            Working...
            X