How to import variable labels from xlsx file to an existing set of variables in a Stata dataset?

Lavendisky Schubert

Join Date: Dec 2024

Posts: 1
#1

How to import variable labels from xlsx file to an existing set of variables in a Stata dataset?

04 Dec 2024, 23:27

I have a dataset that contains j1 to j21 variables. These variables do not have their labels. There are other variables that have proper labels. Now, I also have an excel sheet (variable_label.xlsx), which contains column 1 (variable names: j1 to j21) and column 2 (variable labels of j1 to j21). Now I want to import these labels to the existing variables in my dataset. Instead of manually labelling for each and every variable, what could be the most efficient way to do this task?

This is my first post in this forum. I hope this is just the first of many posts to come in the future.
Tags: None

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

05 Dec 2024, 01:15

Someone might have written something to do this sort of task and put it up on SSC and you might want to check there.

I used to do this looping over the variable names through parallel lists of local macros, which is probably less daunting to stare at if you've never used Mata, but at the risk of overkill I think that a dictionary is maybe a little better fit to the problem.

Something like the following perhaps. (Begin at the "Begin here" comment; the stuff above is just to create a dataset and Excel workbook analogous to what you describe.)

Code:

version 18.0

clear *

// Create variable labels
input str20(variable_name variable_label)
"j1" "Up"
"j2" "Down"
"j3" "Over"
"j4" "Under"
end

quietly export excel variable_label.xlsx, ///
    sheet(VariableLabels) firstrow(variables)

// Create recipient dataset
drop _all
quietly set obs 2
forvalues i = 1/4 {
    generate byte j`i' = 0
}
quietly save dataset

*
* Begin here
*

// Create a dictionary of the variable names and labels from the Excel workbook

quietly import excel variable_label.xlsx, ///
    sheet(VariableLabels) firstrow clear

assert !missing(variable_name, variable_label)

/* (Persistent dictionary is made just below) */
mata:

class LabelList {
    private:
        class AssociativeArray scalar Labels
        void new()
    public:
        void labelem()
}
void function LabelList::new() {
    st_sview(Data=(.), ., "variable_name variable_label")
    for (i=1; i<=rows(Data); i++) Labels.put(Data[i, 1], Data[i, 2])
}
void function LabelList::labelem(string scalar varname) {
    if (Labels.exists(varname)) st_varlabel(varname, Labels.get(varname))
}

L = LabelList()

end

// Now with the dataset back in memory look up the variables in the dictionary
// and label them

use dataset, clear

quietly ds j*
foreach var of varlist `r(varlist)' {
    mata: L.labelem("`var'")
}

// Done
describe

erase variable_label.xlsx
erase dataset.dta

exit

Do-file and log file attached if you're interested.

Attached Files

Comment

daniel klein

Join Date: Mar 2014
Posts: 3848

05 Dec 2024, 01:27

The most efficient way might well be using Excel to create a third column as something like

Code:

=CONCATENATE("label variable ";A1;CHAR(32);CHAR(96);CHAR(34);B1;CHAR(34);CHAR(39))

in the first line, then drag that down. Copy and paste the resulting third column into a do-file and run it on your dataset, and you're done.

A more sophisticated variation might create the do-file in an automated fasion as something like

Code:

// import the variable-to-label mappings as a dataset
import excel variable_label.xlsx

// write the do-file
tempname fh
file open `fh' using "variable_label.do" , write
forvalues i = 1/`=c(N)' {
    file write `fh' "label variable "
    file write `fh' (A[`i'])
    file write `fh' (char(32)+char(96)+char(34))
    file write `fh' (B[`i'])
    file write `fh' (char(34)+char(39))
    file write `fh' _newline
}
file close `fh'

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35696
#4

05 Dec 2024, 01:31

I would read in from MS Excel into two Stata datasets and merge on observation number. Then assuming that the new variable labels are in a variable varlabel

Code:

forval j = 1/21 { label var j`j' "`=varlabel[`j']'" }
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35696

05 Dec 2024, 05:00

Here's my approach applied to Joseph Coveney's helpful sandbox.

Code:

clear *

// Create variable labels
input str20(variable_name variable_label)
"j1" "Up"
"j2" "Down"
"j3" "Over"
"j4" "Under"
end

quietly export excel variable_label.xlsx, ///
    sheet(VariableLabels) firstrow(variables)

// Create recipient dataset
drop _all
quietly set obs 2
forvalues i = 1/4 {
    generate byte j`i' = 0
}
quietly save dataset

import excel using variable_label.xlsx, clear firstrow

merge 1:1 _n using dataset 

forval j = 1/4 { 
    label var j`j' "`=variable_label[`j']'"
}

describe

list 

drop variable_* _merge 

list

Code:

. clear *

. 
. // Create variable labels
. input str20(variable_name variable_label)

            variable_name        variable_label
  1. "j1" "Up"
  2. "j2" "Down"
  3. "j3" "Over"
  4. "j4" "Under"
  5. end

. 
. quietly export excel variable_label.xlsx, ///
>     sheet(VariableLabels) firstrow(variables)

. 
. // Create recipient dataset
. drop _all

. quietly set obs 2

. forvalues i = 1/4 {
  2.     generate byte j`i' = 0
  3. }

. quietly save dataset

. 
. import excel using variable_label.xlsx, clear firstrow
(2 vars, 4 obs)

. 
. merge 1:1 _n using dataset 

    Result                      Number of obs
    -----------------------------------------
    Not matched                             2
        from master                         2  (_merge==1)
        from using                          0  (_merge==2)

    Matched                                 2  (_merge==3)
    -----------------------------------------

. 
. forval j = 1/4 { 
  2.         label var j`j' "`=variable_label[`j']'"
  3. }

. 
. describe

Contains data
 Observations:             4                  
    Variables:             7                  
-------------------------------------------------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------------------------------------------------
variable_name   str2    %9s                   variable_name
variable_label  str5    %9s                   variable_label
j1              byte    %8.0g                 Up
j2              byte    %8.0g                 Down
j3              byte    %8.0g                 Over
j4              byte    %8.0g                 Under
_merge          byte    %23.0g     _merge     Matching result from merge
-------------------------------------------------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

. 
. list 

     +-----------------------------------------------------------+
     | variab~e   variab~l   j1   j2   j3   j4            _merge |
     |-----------------------------------------------------------|
  1. |       j1         Up    0    0    0    0       Matched (3) |
  2. |       j2       Down    0    0    0    0       Matched (3) |
  3. |       j3       Over    .    .    .    .   Master only (1) |
  4. |       j4      Under    .    .    .    .   Master only (1) |
     +-----------------------------------------------------------+

. 
. drop variable_* _merge 

. 
. list 

     +-------------------+
     | j1   j2   j3   j4 |
     |-------------------|
  1. |  0    0    0    0 |
  2. |  0    0    0    0 |
  3. |  .    .    .    . |
  4. |  .    .    .    . |
     +-------------------+

In this case you get extra observations with missing values at the end of the dataset that should be dropped. In your case that is less likely, or so I guess. If it is true, then as said just drop what you don't need.

Comment

Bjarte Aagnes

Join Date: Apr 2014
Posts: 783

09 Dec 2024, 09:57

Code:

local xlsx variable_label.xlsx

tempname labels
frame create `labels'

frame `labels' {  
    
    import excel using "`xlsx'", firstrow  
    
    foreach row of numlist 1/`=_N' {
    
        local varname  = variable_name[`row']
        local varlabel = variable_label[`row']
        
        frame default : label variable `varname' "`varlabel'"
    }
}

Announcement