Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • import excel with two-row headers

    Stata can import excel files, but it assumes that the variable names in the column headers are unique and appear in a single row. The attached file is an exception. It gives the number of children in a sample of about 50 schools who are malnourished, underweight, normal, overweight, or suffer from obesity or severe obesity. These column headers are in row 2 and they repeat: the headers for columns B-G are the same as for columns H-M. The difference is that columns B-G refer to boys and columns H-M refer to girls -- but this is only clear from the merged cells in row 1.

    When column labels are split across two rows this way, is there any convenient way to import them as variable names? It would be great if they came through as boys_malnourished, girls_malnourished, etc.

    I can edit the column labels to be unique, but that's not convenient since in my real data there are more files and more columns....

    Thanks for any ideas!

    Best wishes,
    Paul
    Attached Files
    Last edited by paulvonhippel; 13 Jun 2022, 15:57.

  • #2
    I don't see anything attached.


    Is this file for public consumption? If so, please give the link for where we may find it

    Comment


    • #3
      The main idea is to

      import everything as string

      extract variable names and labels in a loop

      drop observations with variable name and labels

      destring

      Comment


      • #4
        Sorry, I've now attached the file to my original post.

        Comment


        • #5
          Thanks, Nick! I'm not sure I follow. Can you give an example?

          Comment


          • #6
            Originally posted by Nick Cox View Post
            The main idea is to

            import everything as string

            extract variable names and labels in a loop

            drop observations with variable name and labels

            destring
            this is good general advice. I’ve often had to work with multiple-row headers from Excel. The strategy I have found most effective is to split the import process into two parts, the first importing the header rows as strings, the second importing the data, then mapping from one to two.

            Comment


            • #7
              paulvonhippel , in your case, there is an easier way to exploit the structured layout.

              Code:
              import excel using Example.xlsx, cellrange(A2) firstrow case(low) clear
              
              unab oldvarnames : malnourished-m
              foreach sex in boy girl {
                foreach wgt in malnourished underweight normal overweight obesity severe_obesity {
                  local newvarnames `newvarnames' `sex'_`wgt'
                }
              }
              rename (`oldvarnames') (`newvarnames')
              
              desc, fullnames
              Result:

              Code:
              . desc, full
              
              Contains data
               Observations:         4,508                  
                  Variables:            13                  
              --------------------------------------------------------------------------------
              Variable      Storage   Display    Value
                  name         type    format    label      Variable label
              --------------------------------------------------------------------------------
              school          str46   %46s                  School
              boy_malnourished
                              byte    %10.0gc               Malnourished
              boy_underweight byte    %10.0gc               Underweight
              boy_normal      byte    %10.0gc               Normal
              boy_overweight  byte    %10.0gc               Overweight
              boy_obesity     byte    %10.0gc               Obesity
              boy_severe_obesity
                              byte    %10.0gc               Severe obesity
              girl_malnourished
                              byte    %10.0gc               Malnourished
              girl_underweight
                              byte    %10.0gc               Underweight
              girl_normal     byte    %10.0gc               Normal
              girl_overweight byte    %10.0gc               Overweight
              girl_obesity    byte    %10.0gc               Obesity
              girl_severe_obesity
                              byte    %10.0gc               Severe obesity
              --------------------------------------------------------------------------------

              Comment


              • #8
                This example code based on applying post #5 to your data may start you in a useful general direction.
                Code:
                import excel "Example.xlsx", sheet("Prekínder") allstring
                local row
                foreach var of varlist * {
                    local row1 = `var'[1]
                    if "`row1'"!="" local row = "`row1'"+"_"
                    local row2 = `var'[2]
                    local vn = strtoname("`row'`row2'")
                    rename `var' `vn'
                }
                drop in 1/2
                drop if missing(School) // example workbook has lots of empty rows
                destring, replace
                Code:
                . describe, fullnames
                
                Contains data
                 Observations:            48                  
                    Variables:            13                  
                ------------------------------------------------------------------------------------------------
                Variable      Storage   Display    Value
                    name         type    format    label      Variable label
                ------------------------------------------------------------------------------------------------
                School          str46   %46s                  
                Boys_Malnourished
                                byte    %10.0g                
                Boys_Underweight
                                byte    %10.0g                
                Boys_Normal     byte    %10.0g                
                Boys_Overweight byte    %10.0g                
                Boys_Obesity    byte    %10.0g                
                Boys_Severe_obesity
                                byte    %10.0g                
                Girls_Malnourished
                                byte    %10.0g                
                Girls_Underweight
                                byte    %10.0g                
                Girls_Normal    byte    %10.0g                
                Girls_Overweight
                                byte    %10.0g                
                Girls_Obesity   byte    %10.0g                
                Girls_Severe_obesity
                                byte    %10.0g                
                ------------------------------------------------------------------------------------------------
                Sorted by:
                     Note: Dataset has changed since last saved.
                Last edited by William Lisowski; 13 Jun 2022, 16:36.

                Comment

                Working...
                X