Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to import variable labels from xlsx file to an existing set of variables in a Stata dataset?

    I have a dataset that contains j1 to j21 variables. These variables do not have their labels. There are other variables that have proper labels. Now, I also have an excel sheet (variable_label.xlsx), which contains column 1 (variable names: j1 to j21) and column 2 (variable labels of j1 to j21). Now I want to import these labels to the existing variables in my dataset. Instead of manually labelling for each and every variable, what could be the most efficient way to do this task?

    This is my first post in this forum. I hope this is just the first of many posts to come in the future.

  • #2
    Someone might have written something to do this sort of task and put it up on SSC and you might want to check there.

    I used to do this looping over the variable names through parallel lists of local macros, which is probably less daunting to stare at if you've never used Mata, but at the risk of overkill I think that a dictionary is maybe a little better fit to the problem.

    Something like the following perhaps. (Begin at the "Begin here" comment; the stuff above is just to create a dataset and Excel workbook analogous to what you describe.)
    Code:
    version 18.0
    
    clear *
    
    // Create variable labels
    input str20(variable_name variable_label)
    "j1" "Up"
    "j2" "Down"
    "j3" "Over"
    "j4" "Under"
    end
    
    quietly export excel variable_label.xlsx, ///
        sheet(VariableLabels) firstrow(variables)
    
    // Create recipient dataset
    drop _all
    quietly set obs 2
    forvalues i = 1/4 {
        generate byte j`i' = 0
    }
    quietly save dataset
    
    *
    * Begin here
    *
    
    // Create a dictionary of the variable names and labels from the Excel workbook
    
    quietly import excel variable_label.xlsx, ///
        sheet(VariableLabels) firstrow clear
    
    assert !missing(variable_name, variable_label)
    
    /* (Persistent dictionary is made just below) */
    mata:
    
    class LabelList {
        private:
            class AssociativeArray scalar Labels
            void new()
        public:
            void labelem()
    }
    void function LabelList::new() {
        st_sview(Data=(.), ., "variable_name variable_label")
        for (i=1; i<=rows(Data); i++) Labels.put(Data[i, 1], Data[i, 2])
    }
    void function LabelList::labelem(string scalar varname) {
        if (Labels.exists(varname)) st_varlabel(varname, Labels.get(varname))
    }
    
    L = LabelList()
    
    end
    
    // Now with the dataset back in memory look up the variables in the dictionary
    // and label them
    
    use dataset, clear
    
    quietly ds j*
    foreach var of varlist `r(varlist)' {
        mata: L.labelem("`var'")
    }
    
    // Done
    describe
    
    erase variable_label.xlsx
    erase dataset.dta
    
    exit
    Do-file and log file attached if you're interested.
    Attached Files

    Comment


    • #3
      The most efficient way might well be using Excel to create a third column as something like
      Code:
      =CONCATENATE("label variable ";A1;CHAR(32);CHAR(96);CHAR(34);B1;CHAR(34);CHAR(39))
      in the first line, then drag that down. Copy and paste the resulting third column into a do-file and run it on your dataset, and you're done.

      A more sophisticated variation might create the do-file in an automated fasion as something like
      Code:
      // import the variable-to-label mappings as a dataset
      import excel variable_label.xlsx
      
      // write the do-file
      tempname fh
      file open `fh' using "variable_label.do" , write
      forvalues i = 1/`=c(N)' {
          file write `fh' "label variable "
          file write `fh' (A[`i'])
          file write `fh' (char(32)+char(96)+char(34))
          file write `fh' (B[`i'])
          file write `fh' (char(34)+char(39))
          file write `fh' _newline
      }
      file close `fh'

      Comment


      • #4
        I would read in from MS Excel into two Stata datasets and merge on observation number. Then assuming that the new variable labels are in a variable varlabel

        Code:
        forval j = 1/21 { 
               label var j`j'  "`=varlabel[`j']'" 
        }

        Comment


        • #5
          Here's my approach applied to Joseph Coveney's helpful sandbox.

          Code:
          clear *
          
          // Create variable labels
          input str20(variable_name variable_label)
          "j1" "Up"
          "j2" "Down"
          "j3" "Over"
          "j4" "Under"
          end
          
          quietly export excel variable_label.xlsx, ///
              sheet(VariableLabels) firstrow(variables)
          
          // Create recipient dataset
          drop _all
          quietly set obs 2
          forvalues i = 1/4 {
              generate byte j`i' = 0
          }
          quietly save dataset
          
          import excel using variable_label.xlsx, clear firstrow
          
          merge 1:1 _n using dataset 
          
          forval j = 1/4 { 
              label var j`j' "`=variable_label[`j']'"
          }
          
          describe
          
          list 
          
          drop variable_* _merge 
          
          list
          Code:
          . clear *
          
          . 
          . // Create variable labels
          . input str20(variable_name variable_label)
          
                      variable_name        variable_label
            1. "j1" "Up"
            2. "j2" "Down"
            3. "j3" "Over"
            4. "j4" "Under"
            5. end
          
          . 
          . quietly export excel variable_label.xlsx, ///
          >     sheet(VariableLabels) firstrow(variables)
          
          . 
          . // Create recipient dataset
          . drop _all
          
          . quietly set obs 2
          
          . forvalues i = 1/4 {
            2.     generate byte j`i' = 0
            3. }
          
          . quietly save dataset
          
          . 
          . import excel using variable_label.xlsx, clear firstrow
          (2 vars, 4 obs)
          
          . 
          . merge 1:1 _n using dataset 
          
              Result                      Number of obs
              -----------------------------------------
              Not matched                             2
                  from master                         2  (_merge==1)
                  from using                          0  (_merge==2)
          
              Matched                                 2  (_merge==3)
              -----------------------------------------
          
          . 
          . forval j = 1/4 { 
            2.         label var j`j' "`=variable_label[`j']'"
            3. }
          
          . 
          . describe
          
          Contains data
           Observations:             4                  
              Variables:             7                  
          -------------------------------------------------------------------------------------------------------------------------
          Variable      Storage   Display    Value
              name         type    format    label      Variable label
          -------------------------------------------------------------------------------------------------------------------------
          variable_name   str2    %9s                   variable_name
          variable_label  str5    %9s                   variable_label
          j1              byte    %8.0g                 Up
          j2              byte    %8.0g                 Down
          j3              byte    %8.0g                 Over
          j4              byte    %8.0g                 Under
          _merge          byte    %23.0g     _merge     Matching result from merge
          -------------------------------------------------------------------------------------------------------------------------
          Sorted by: 
               Note: Dataset has changed since last saved.
          
          . 
          . list 
          
               +-----------------------------------------------------------+
               | variab~e   variab~l   j1   j2   j3   j4            _merge |
               |-----------------------------------------------------------|
            1. |       j1         Up    0    0    0    0       Matched (3) |
            2. |       j2       Down    0    0    0    0       Matched (3) |
            3. |       j3       Over    .    .    .    .   Master only (1) |
            4. |       j4      Under    .    .    .    .   Master only (1) |
               +-----------------------------------------------------------+
          
          . 
          . drop variable_* _merge 
          
          . 
          . list 
          
               +-------------------+
               | j1   j2   j3   j4 |
               |-------------------|
            1. |  0    0    0    0 |
            2. |  0    0    0    0 |
            3. |  .    .    .    . |
            4. |  .    .    .    . |
               +-------------------+
          In this case you get extra observations with missing values at the end of the dataset that should be dropped. In your case that is less likely, or so I guess. If it is true, then as said just drop what you don't need.

          Comment


          • #6
            Code:
            local xlsx variable_label.xlsx
            
            tempname labels
            frame create `labels'
            
            frame `labels' {  
                
                import excel using "`xlsx'", firstrow  
                
                foreach row of numlist 1/`=_N' {
                
                    local varname  = variable_name[`row']
                    local varlabel = variable_label[`row']
                    
                    frame default : label variable `varname' "`varlabel'"
                }
            }

            Comment

            Working...
            X