Hi there,
My primary problem is with quoting and extracting a value label definition from a string macro. I've spent more than a week trying to figure this out, so I finally registered here to see what you all think. Here is a thorough and hopefully concise explanation of what I am trying to do:
I have a dataset with over 1300 variables. My team and I have used excel to create a codebook where each row has all the information about the variable, including the 'old' var name (how the variable name imports in a CSV directly from Qualtrics), 'new' descriptive var name, variable label, and value labels. I realize it would be simple to create a few loops to label the variables and values by importing this codebook directly, but I want to document the entire data cleaning process in a do-file, and any changes we make to future versions, rather than risk having untraceable edits made to the excel codebook. I also want to organize the do-file similarly to the excel codebook, where each variable and all of it's characteristics can be found together.
The most clever way I have thought of so far has been to make a local macro of each row and copy over all the excel columns into a do-file, then loop over the macros with the labeling commands by parsing the words of the macros. Each word corresponds to the old variable name, new variable name, var label, and, where applicable, the value label name and definition. Here's a sample of the rows I've copied over from the excel codebook into my do-file:
local m34 q226 id_emp_num "File number/employee number/bidding number"
local m35 q110 id_consent "Consent to participate" consent "1 agree 2 disagree"
local m36 q21 nq_2 "(Non-question) Can return to survey"
local m37 q22 nq_3 "(Non-question) Intro to section 1"
local m38 q23 w_last30 "Worked one flight in last 30 days" yesno "1 yes 2 no"
local m39 q24 w_still_fa "Are you still a flight attendant?" yesno
local m40 q25 w_reasnotworked30 "Reason not worked as FA last 30 days (among current FA)" reasnotworked "8 part_time 4 involuntary_furlough 20 voluntary_furlogh 21 sick_leave 16 workers_comp_disability 5 vacation/personal_leave 1 maternity/paternity_leave 7 other"
local m41 q25_text w_reasnotworked30oth "Other reason not worked as FA text (among current FA)"
local m42 q26 wf_reasnotworked30 "Reason not worked as FA last 30 days (among former FA)" formerreasnotwk "1 otherjob 2 retired 3 unemployed-looking 4 unemployed-unable_to_work-health 5 unemployed-taking_care_of_family 6 school 7 other"
local m43 q26_text wf_reasnotworked30oth "Other reason not worked as FA text (among former FA)"
Notice that the last quote of the macros with label definitions (locals m38, m40, m42) is using the label definition without quotes, which means that I had to remove space, and I won't be able to label numbers, like 1 "1956" 2 "1957", etc. The following loop I created *does* work for the way I set these macros up, but it won't work for labeling numbers and I'm not convinced this is is the most parsimonious code or approach to my goals.
forvalues i=34/43 {
*rawvar
local rawvar: word 1 of `m`i''
dis "`rawvar'"
*newvar
local newvar: word 2 of `m`i''
dis "`newvar'"
*var label
local varlabel: word 3 of `m`i''
dis "`varlabel'"
*value label definition
local labdef: word 4 of `m`i''
dis "`labdef'"
*value labels
local vallab: word 5 of `m`i''
dis "`vallab'"
*rename variables
rename `rawvar' `newvar'
*label variables
label var `newvar' "`varlabel'"
codebook `newvar'
*rule out vars that dont have labels
local n: word count `m`i''
dis "`n'"
*define and apply value labels for the vars that need them
if `n'>3 {
destring `newvar', replace
}
if `n'>4 {label define `labdef' `vallab'
}
if `n'>3 {label val `newvar' `labdef'
}
}
Is there a way to enclose the label definitions within the macros while using quotes in the label definitions? Or some other way to achieve my goal of keeping all the variable naming/labeling organized in the do-file as succinctly as possible given that there are over a thousand variables? (~1300)
Thanks for any guidance!
My primary problem is with quoting and extracting a value label definition from a string macro. I've spent more than a week trying to figure this out, so I finally registered here to see what you all think. Here is a thorough and hopefully concise explanation of what I am trying to do:
I have a dataset with over 1300 variables. My team and I have used excel to create a codebook where each row has all the information about the variable, including the 'old' var name (how the variable name imports in a CSV directly from Qualtrics), 'new' descriptive var name, variable label, and value labels. I realize it would be simple to create a few loops to label the variables and values by importing this codebook directly, but I want to document the entire data cleaning process in a do-file, and any changes we make to future versions, rather than risk having untraceable edits made to the excel codebook. I also want to organize the do-file similarly to the excel codebook, where each variable and all of it's characteristics can be found together.
The most clever way I have thought of so far has been to make a local macro of each row and copy over all the excel columns into a do-file, then loop over the macros with the labeling commands by parsing the words of the macros. Each word corresponds to the old variable name, new variable name, var label, and, where applicable, the value label name and definition. Here's a sample of the rows I've copied over from the excel codebook into my do-file:
local m34 q226 id_emp_num "File number/employee number/bidding number"
local m35 q110 id_consent "Consent to participate" consent "1 agree 2 disagree"
local m36 q21 nq_2 "(Non-question) Can return to survey"
local m37 q22 nq_3 "(Non-question) Intro to section 1"
local m38 q23 w_last30 "Worked one flight in last 30 days" yesno "1 yes 2 no"
local m39 q24 w_still_fa "Are you still a flight attendant?" yesno
local m40 q25 w_reasnotworked30 "Reason not worked as FA last 30 days (among current FA)" reasnotworked "8 part_time 4 involuntary_furlough 20 voluntary_furlogh 21 sick_leave 16 workers_comp_disability 5 vacation/personal_leave 1 maternity/paternity_leave 7 other"
local m41 q25_text w_reasnotworked30oth "Other reason not worked as FA text (among current FA)"
local m42 q26 wf_reasnotworked30 "Reason not worked as FA last 30 days (among former FA)" formerreasnotwk "1 otherjob 2 retired 3 unemployed-looking 4 unemployed-unable_to_work-health 5 unemployed-taking_care_of_family 6 school 7 other"
local m43 q26_text wf_reasnotworked30oth "Other reason not worked as FA text (among former FA)"
Notice that the last quote of the macros with label definitions (locals m38, m40, m42) is using the label definition without quotes, which means that I had to remove space, and I won't be able to label numbers, like 1 "1956" 2 "1957", etc. The following loop I created *does* work for the way I set these macros up, but it won't work for labeling numbers and I'm not convinced this is is the most parsimonious code or approach to my goals.
forvalues i=34/43 {
*rawvar
local rawvar: word 1 of `m`i''
dis "`rawvar'"
*newvar
local newvar: word 2 of `m`i''
dis "`newvar'"
*var label
local varlabel: word 3 of `m`i''
dis "`varlabel'"
*value label definition
local labdef: word 4 of `m`i''
dis "`labdef'"
*value labels
local vallab: word 5 of `m`i''
dis "`vallab'"
*rename variables
rename `rawvar' `newvar'
*label variables
label var `newvar' "`varlabel'"
codebook `newvar'
*rule out vars that dont have labels
local n: word count `m`i''
dis "`n'"
*define and apply value labels for the vars that need them
if `n'>3 {
destring `newvar', replace
}
if `n'>4 {label define `labdef' `vallab'
}
if `n'>3 {label val `newvar' `labdef'
}
}
Is there a way to enclose the label definitions within the macros while using quotes in the label definitions? Or some other way to achieve my goal of keeping all the variable naming/labeling organized in the do-file as succinctly as possible given that there are over a thousand variables? (~1300)
Thanks for any guidance!
Comment