Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Automating value labels from macros

    Hi there,

    My primary problem is with quoting and extracting a value label definition from a string macro. I've spent more than a week trying to figure this out, so I finally registered here to see what you all think. Here is a thorough and hopefully concise explanation of what I am trying to do:

    I have a dataset with over 1300 variables. My team and I have used excel to create a codebook where each row has all the information about the variable, including the 'old' var name (how the variable name imports in a CSV directly from Qualtrics), 'new' descriptive var name, variable label, and value labels. I realize it would be simple to create a few loops to label the variables and values by importing this codebook directly, but I want to document the entire data cleaning process in a do-file, and any changes we make to future versions, rather than risk having untraceable edits made to the excel codebook. I also want to organize the do-file similarly to the excel codebook, where each variable and all of it's characteristics can be found together.

    The most clever way I have thought of so far has been to make a local macro of each row and copy over all the excel columns into a do-file, then loop over the macros with the labeling commands by parsing the words of the macros. Each word corresponds to the old variable name, new variable name, var label, and, where applicable, the value label name and definition. Here's a sample of the rows I've copied over from the excel codebook into my do-file:
    local m34 q226 id_emp_num "File number/employee number/bidding number"
    local m35 q110 id_consent "Consent to participate" consent "1 agree 2 disagree"
    local m36 q21 nq_2 "(Non-question) Can return to survey"
    local m37 q22 nq_3 "(Non-question) Intro to section 1"
    local m38 q23 w_last30 "Worked one flight in last 30 days" yesno "1 yes 2 no"
    local m39 q24 w_still_fa "Are you still a flight attendant?" yesno
    local m40 q25 w_reasnotworked30 "Reason not worked as FA last 30 days (among current FA)" reasnotworked "8 part_time 4 involuntary_furlough 20 voluntary_furlogh 21 sick_leave 16 workers_comp_disability 5 vacation/personal_leave 1 maternity/paternity_leave 7 other"
    local m41 q25_text w_reasnotworked30oth "Other reason not worked as FA text (among current FA)"
    local m42 q26 wf_reasnotworked30 "Reason not worked as FA last 30 days (among former FA)" formerreasnotwk "1 otherjob 2 retired 3 unemployed-looking 4 unemployed-unable_to_work-health 5 unemployed-taking_care_of_family 6 school 7 other"
    local m43 q26_text wf_reasnotworked30oth "Other reason not worked as FA text (among former FA)"


    Notice that the last quote of the macros with label definitions (locals m38, m40, m42) is using the label definition without quotes, which means that I had to remove space, and I won't be able to label numbers, like 1 "1956" 2 "1957", etc. The following loop I created *does* work for the way I set these macros up, but it won't work for labeling numbers and I'm not convinced this is is the most parsimonious code or approach to my goals.
    forvalues i=34/43 {
    *rawvar
    local rawvar: word 1 of `m`i''
    dis "`rawvar'"
    *newvar
    local newvar: word 2 of `m`i''
    dis "`newvar'"
    *var label
    local varlabel: word 3 of `m`i''
    dis "`varlabel'"
    *value label definition
    local labdef: word 4 of `m`i''
    dis "`labdef'"
    *value labels
    local vallab: word 5 of `m`i''
    dis "`vallab'"


    *rename variables
    rename `rawvar' `newvar'
    *label variables
    label var `newvar' "`varlabel'"
    codebook `newvar'
    *rule out vars that dont have labels
    local n: word count `m`i''
    dis "`n'"
    *define and apply value labels for the vars that need them
    if `n'>3 {
    destring `newvar', replace
    }
    if `n'>4 {label define `labdef' `vallab'
    }
    if `n'>3 {label val `newvar' `labdef'
    }
    }


    Is there a way to enclose the label definitions within the macros while using quotes in the label definitions? Or some other way to achieve my goal of keeping all the variable naming/labeling organized in the do-file as succinctly as possible given that there are over a thousand variables? (~1300)

    Thanks for any guidance!

  • #2
    It's usually wise to test the workflow on a small subset before putting in hours/days of manual work (building up the Excel "codebook"). You don't show how you go from Excel to your list of macros. You seem to imply that each row describes a single variable but it's not clear how many columns you have in Excel. Why do you combine all the information in a single macro that you then have to parse out the pieces?

    Going with what you show, your biggest problem is that if you defined labels in your Excel "codebook" without using double-quotes, your label define statement will fail as certain characters will be interpreted as delimiters. For example,

    Code:
    . label define reasnotworked 1 maternity/paternity_leave 7 other
    invalid syntax
    r(198);
    Assuming that the Excel "codebook" is done and value labels defined as you show, the following will parse out your labels.

    Code:
    * make dummy dataset to test code
    clear
    set obs 1
    foreach q in q226 q110 q21 q22 q23 q24 q25 q25_text q26 q26_text {
        gen `q' = "1"
    }
    
    * copied as is from the original post
    local m34 q226 id_emp_num "File number/employee number/bidding number" 
    local m35 q110 id_consent "Consent to participate" consent "1 agree 2 disagree"
    local m36 q21 nq_2 "(Non-question) Can return to survey" 
    local m37 q22 nq_3 "(Non-question) Intro to section 1" 
    local m38 q23 w_last30 "Worked one flight in last 30 days" yesno "1 yes 2 no"
    local m39 q24 w_still_fa "Are you still a flight attendant?" yesno 
    local m40 q25 w_reasnotworked30 "Reason not worked as FA last 30 days (among current FA)" reasnotworked "8 part_time 4 involuntary_furlough 20 voluntary_furlogh 21 sick_leave 16 workers_comp_disability 5 vacation/personal_leave 1 maternity/paternity_leave 7 other"
    local m41 q25_text w_reasnotworked30oth "Other reason not worked as FA text (among current FA)" 
    local m42 q26 wf_reasnotworked30 "Reason not worked as FA last 30 days (among former FA)" formerreasnotwk "1 otherjob 2 retired 3 unemployed-looking 4 unemployed-unable_to_work-health 5 unemployed-taking_care_of_family 6 school 7 other"
    local m43 q26_text wf_reasnotworked30oth "Other reason not worked as FA text (among former FA)" 
    
    forvalues i=34/43 {
    
        tokenize `"`m`i''"'
        
        rename `1' `2' 
        
        label var `2' "`3'"
        
        if "`4'" != "" {
            destring `2', replace
            
            if "`5'" != "" {
                gettoken ival next : 5
                while "`ival'" != "" {
                    gettoken itext next : next
                    label def `4' `ival' "`itext'", add
                    gettoken ival next : next        
                }
            }
            
            label values `2' `4'
        }
    }

    Comment


    • #3
      Sami,

      I don't know how practical this is since I don't know what your codebook file looks like, but I think the key here is to use compound double quotes to allow your label definitions to contain quotation marks. Here are some examples:

      Code:
      local m35 q110 id_consent "Consent to participate" consent `"1 "agree" 2 "disagree""'
      local m35 q110 id_consent "Consent to participate" consent `"1 "1111" 2 "2222""'
      local m35 q110 id_consent "Consent to participate" consent `"1 "1-agree" 2 "2-disagree""'
      local m35 q110 id_consent "Consent to participate" consent `"1 "yes/agree" 2 "disagree""'
      Everything will run the same except for the line where you display the value label, which should be:

      Code:
      dis `"`vallab'"'
      Also, I needed to change the last few statements as follows to get this to run correctly:

      Code:
      if `n'>4 {
      label define `labdef' `vallab'
      }
      if `n'>3 {
      label val `newvar' `labdef'
      }
      i.e., add a newline after each curly bracket.

      Regards,
      Joe

      Comment


      • #4
        Hi Robert,

        Thank you so much for your thoughtful reply! To answer your questions: there are 5 columns in the excel that need parsing (oldvar, newvar, var label, value label name, value label definition), and I added two columns to the left with "local" and the macro name, so I could easily copy a block of rows from excel to my do-file. Does a better way to do this come to mind? (excluding importing the excel itself!)

        You are absolutely right that it is necessary to have double-quotes in my value labels- I just could not figure out how I would parse them in a loop if I didn't enclose the entire label definition in one set of double quotes. Your code worked beautifully for the way I had the macros set up, thank you! Now I am trying to figure out how to tweak it to work with the more logical double-quote way, as in an example row below. It's past the limit of my syntax experience, unfortunately. Thank you for all your help!

        local m35 q110 id_consent "Consent to participate" consent 1 "agree" 2 "disagree"

        Comment


        • #5
          Hi Joe,

          Your reply must have come in while I was responding to Robert. Thank you as well, your code worked very well too! I could not for the life of me figure out how to enclose those statements. Now it will be a very quick fix to add in columns with the compound double quotes to bracket the label column, and re-copy/paste the entire block of text into my do-file. Thank you so much!
          -Sami

          Comment


          • #6
            As Joe has pointed out, the solution is to use compound double-quotes. I assumed this would be tedious to redo in Excel. Here's a reworked version that uses them

            Code:
            * make dummy dataset to test code
            clear
            set obs 1
            foreach q in q226 q110 q21 q22 q23 q24 q25 q25_text q26 q26_text {
                gen `q' = "1"
            }
            
            * copied as is from the original post
            local m34 q226 id_emp_num "File number/employee number/bidding number" 
            local m35 q110 id_consent "Consent to participate" consent "1 agree 2 disagree"
            local m36 q21 nq_2 "(Non-question) Can return to survey" 
            local m37 q22 nq_3 "(Non-question) Intro to section 1" 
            local m38 q23 w_last30 "Worked one flight in last 30 days" yesno "1 yes 2 no"
            local m39 q24 w_still_fa "Are you still a flight attendant?" yesno 
            local m40 q25 w_reasnotworked30 "Reason not worked as FA last 30 days (among current FA)" reasnotworked `"8 part_time 4 involuntary_furlough 20 voluntary_furlogh 21 sick_leave 16 workers_comp_disability 5 "vacation/personal_leave" 1 "maternity/paternity_leave" 7 other"'
            local m41 q25_text w_reasnotworked30oth "Other reason not worked as FA text (among current FA)" 
            local m42 q26 wf_reasnotworked30 "Reason not worked as FA last 30 days (among former FA)" formerreasnotwk `"1 otherjob 2 retired 3 "unemployed-looking" 4 "unemployed-unable_to_work-health" 5 "unemployed-taking_care_of_family" 6 school 7 other"'
            local m43 q26_text wf_reasnotworked30oth "Other reason not worked as FA text (among former FA)" 
            
            forvalues i=34/43 {
            
                tokenize `"`m`i''"'
                
                rename `1' `2' 
                
                label var `2' "`3'"
                
                if "`4'" != "" {
                    destring `2', replace
                    
                    if `"`5'"' != "" {
                        
                            label def `4' `ival' `5'
                    }
                    
                    label values `2' `4'
                }
            }

            Comment

            Working...
            X