import excel with two-row headers

paulvonhippel

Join Date: Apr 2014

Posts: 502
#1

import excel with two-row headers

13 Jun 2022, 15:34

Stata can import excel files, but it assumes that the variable names in the column headers are unique and appear in a single row. The attached file is an exception. It gives the number of children in a sample of about 50 schools who are malnourished, underweight, normal, overweight, or suffer from obesity or severe obesity. These column headers are in row 2 and they repeat: the headers for columns B-G are the same as for columns H-M. The difference is that columns B-G refer to boys and columns H-M refer to girls -- but this is only clear from the merged cells in row 1.

When column labels are split across two rows this way, is there any convenient way to import them as variable names? It would be great if they came through as boys_malnourished, girls_malnourished, etc.

I can edit the column labels to be unique, but that's not convenient since in my real data there are more files and more columns....

Thanks for any ideas!

Best wishes,
Paul
Attached Files

Example.xlsx (193.6 KB, 1 view)

Last edited by paulvonhippel; 13 Jun 2022, 15:57.
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

13 Jun 2022, 15:48

I don't see anything attached.

Is this file for public consumption? If so, please give the link for where we may find it
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#3

13 Jun 2022, 15:54

The main idea is to

import everything as string

extract variable names and labels in a loop

drop observations with variable name and labels

destring
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 502
#4

13 Jun 2022, 15:57

Sorry, I've now attached the file to my original post.
Comment
paulvonhippel

Join Date: Apr 2014

Posts: 502
#5

13 Jun 2022, 15:59

Thanks, Nick! I'm not sure I follow. Can you give an example?
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#6

13 Jun 2022, 16:07

Originally posted by Nick Cox View Post

The main idea is to

import everything as string

extract variable names and labels in a loop

drop observations with variable name and labels

destring

this is good general advice. I’ve often had to work with multiple-row headers from Excel. The strategy I have found most effective is to split the import process into two parts, the first importing the header rows as strings, the second importing the data, then mapping from one to two.
1 like
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2402

13 Jun 2022, 16:29

paulvonhippel , in your case, there is an easier way to exploit the structured layout.

Code:

import excel using Example.xlsx, cellrange(A2) firstrow case(low) clear

unab oldvarnames : malnourished-m
foreach sex in boy girl {
  foreach wgt in malnourished underweight normal overweight obesity severe_obesity {
    local newvarnames `newvarnames' `sex'_`wgt'
  }
}
rename (`oldvarnames') (`newvarnames')

desc, fullnames

Result:

Code:

. desc, full

Contains data
 Observations:         4,508                  
    Variables:            13                  
--------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
--------------------------------------------------------------------------------
school          str46   %46s                  School
boy_malnourished
                byte    %10.0gc               Malnourished
boy_underweight byte    %10.0gc               Underweight
boy_normal      byte    %10.0gc               Normal
boy_overweight  byte    %10.0gc               Overweight
boy_obesity     byte    %10.0gc               Obesity
boy_severe_obesity
                byte    %10.0gc               Severe obesity
girl_malnourished
                byte    %10.0gc               Malnourished
girl_underweight
                byte    %10.0gc               Underweight
girl_normal     byte    %10.0gc               Normal
girl_overweight byte    %10.0gc               Overweight
girl_obesity    byte    %10.0gc               Obesity
girl_severe_obesity
                byte    %10.0gc               Severe obesity
--------------------------------------------------------------------------------

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

13 Jun 2022, 16:33

This example code based on applying post #5 to your data may start you in a useful general direction.

Code:

import excel "Example.xlsx", sheet("Prekínder") allstring
local row
foreach var of varlist * {
    local row1 = `var'[1]
    if "`row1'"!="" local row = "`row1'"+"_"
    local row2 = `var'[2]
    local vn = strtoname("`row'`row2'")
    rename `var' `vn'
}
drop in 1/2
drop if missing(School) // example workbook has lots of empty rows
destring, replace

Code:

. describe, fullnames

Contains data
 Observations:            48                  
    Variables:            13                  
------------------------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
------------------------------------------------------------------------------------------------
School          str46   %46s                  
Boys_Malnourished
                byte    %10.0g                
Boys_Underweight
                byte    %10.0g                
Boys_Normal     byte    %10.0g                
Boys_Overweight byte    %10.0g                
Boys_Obesity    byte    %10.0g                
Boys_Severe_obesity
                byte    %10.0g                
Girls_Malnourished
                byte    %10.0g                
Girls_Underweight
                byte    %10.0g                
Girls_Normal    byte    %10.0g                
Girls_Overweight
                byte    %10.0g                
Girls_Obesity   byte    %10.0g                
Girls_Severe_obesity
                byte    %10.0g                
------------------------------------------------------------------------------------------------
Sorted by:
     Note: Dataset has changed since last saved.

Last edited by William Lisowski; 13 Jun 2022, 16:36.

Announcement

import excel with two-row headers

Comment

Comment

Comment

Comment

Comment

Comment

Comment