Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reading an Excel file with one mereged cells

    Hi All,

    I have an Excel file, as shown below (this is only an example of the data), I would like to be able to imported to Stata
    Sup MF Id output
    M F
    1 136 78 1 0
    1 8 11 1 1
    1 0 0 1 2
    1 0 0 1 3
    1 0 2 1 4
    Regards

  • #2
    Before you can use -import excel- in Stata, you must manually edit the spreadsheet so that you have only one line with variable names, and they must obey Stata's rules for variable names. You could have these names:
    Code:
    Sup  MF_M  MF_F  id  output

    Comment


    • #3
      Depending on how complex your actual dataset(s) look like, you can just

      Code:
      import excel Sup MF_M MF_F Id output using <foo.xlsx> ,cellrange(A3)
      Best
      Daniel

      Comment


      • #4
        In general, I would advise against editing your data in Excel. It leaves no audit trail when you do so, and when you come back to this some weeks from now, you won't remember what you did. In fact, the only thing I recommend doing with data in Excel is -import- it to Stata. Once in Stata you can do with it whatever you want and keep a complete record of it in your do-files and log-files. In line with that, I endorse Daniel's solution above.

        By the way, when using -import excel- with the -firstrow- option it is not necessary that the Excel column headers all be legal Stata variable names. Stata will accept those that are, and make modifications to those that are not, or, where it can't figure out how to do that, name them after the Excel columns they are found in.

        Comment


        • #5
          "In general, I would advise against editing your data in Excel. It leaves no audit trail when you do so..." Interesting and extreme approach. I actually wrote a 20+ item list of what to do and what not to do with data, with #1 being "Always Use Syntax." But, especially with complexly formatted spreadsheets or data with double-headers (variable names one row, what should be variable labels in the second row), I've always fixed things up manually, then kept the original and modified files around. Maybe next time I'll try a pure-syntax approach, even if it does take me much more time and trial-and-error. In the long run (say I have 20 census files to process), might even save time. Hmm...

          Comment


          • #6
            Clyde:
            In Hatim's Excel sheet it is necessary to change the first two lines to one before trying to import in Stata. I see no other way than doing it manually. But I would do it on a copy of the original sheet.

            Comment


            • #7
              Svend,

              I don't think it is necessary to do it that way. Daniel Klein's solution simply ignores those first two lines using the -cellrange()- option and creates the variable names directly in the -import excel- command.

              Comment


              • #8
                Actually, there is more than one way of doing this and I have already demonstrated a possible solution. Another possibility would be importing the first two rows, extract the information and create the variable names, then import everything from row 3 on, passing the constructed names to the import excel call I have given earlier. Consider

                Code:
                // import the first two rows holding the names
                import excel using <foo.xlsx> ,clear
                keep in 1/2
                
                // get the names in the first row
                unab allvars : _all
                foreach v of loc allvars {
                    
                    // get the name in the first row
                    loc name = `v'[1]
                    if ("`name'" == "") {
                        // if empty, replace with previous name
                        loc name `prev'
                    }
                    loc prev `name' // save previous name
                    
                    // get additional name in second row
                    // (this is for the merged cells)
                    loc secondrow = `v'[2]
                    if ("`secondrow'" != "") {
                        
                        // add the name in the secon row
                        // if there is one
                        loc name `name'_`secondrow'
                    }
                    
                    // make it a valid Stata name
                    // and build a namelist
                    loc name = strtoname("`name'")
                    loc names `names' `name'
                }
                
                // now get the complete data, passing the namelist to -import excel-
                // (this is same as show earlier, but automated)
                import excel `names' using <foo.xlsx> ,cellra(A3) clear
                
                // done
                // no manipulation of the Excel file necessary
                This will fail if the first two cells are merged, but one could think of a solution for this as well.

                Best
                Daniel
                Last edited by daniel klein; 11 Nov 2014, 09:18.

                Comment


                • #9
                  Daniel, Clyde:
                  Nice; I learned something.

                  Svend

                  Comment


                  • #10
                    Originally posted by ben earnhart View Post
                    "In general, I would advise against editing your data in Excel. It leaves no audit trail when you do so..." Interesting and extreme approach. I actually wrote a 20+ item list of what to do and what not to do with data, with #1 being "Always Use Syntax." But, especially with complexly formatted spreadsheets or data with double-headers (variable names one row, what should be variable labels in the second row), I've always fixed things up manually, then kept the original and modified files around. Maybe next time I'll try a pure-syntax approach, even if it does take me much more time and trial-and-error. In the long run (say I have 20 census files to process), might even save time. Hmm...
                    It's an interesting topic. Usually, my preference would to do as much as possible via syntax with limiting the editing in Excel to minimum. The point is that, very often, the data we have to work with is presented in a format that is not necessarily most convenient for statistical analysis (more on that). In terms of importing from Excel presumably it would be worthwhile to explore possibilities offered by the cellrange() and maybe limit the import to the desired set of cells.
                    Kind regards,
                    Konrad
                    Version: Stata/IC 13.1

                    Comment


                    • #11
                      I really love this solution. However, when I run these codes, I got the following error message. Could you please tell me what the problem is? Many thanks!

                      variables multiply defined
                      Q17a_Count
                      Q17a_Percent


                      [QUOTE=daniel klein;n419097]Actually, there is more than one way of doing this and I have already demonstrated a possible solution. Another possibility would be importing the first two rows, extract the information and create the variable names, then import everything from row 3 on, passing the constructed names to the import excel call I have given earlier. Consider

                      Code:
                      // import the first two rows holding the names
                      import excel using <foo.xlsx> ,clear
                      keep in 1/2
                      
                      // get the names in the first row
                      unab allvars : _all
                      foreach v of loc allvars {
                      
                      // get the name in the first row
                      loc name = `v'[1]
                      if ("`name'" == "") {
                      // if empty, replace with previous name
                      loc name `prev'
                      }
                      loc prev `name' // save previous name
                      
                      // get additional name in second row
                      // (this is for the merged cells)
                      loc secondrow = `v'[2]
                      if ("`secondrow'" != "") {
                      
                      // add the name in the secon row
                      // if there is one
                      loc name `name'_`secondrow'
                      }
                      
                      // make it a valid Stata name
                      // and build a namelist
                      loc name = strtoname("`name'")
                      loc names `names' `name'
                      }
                      
                      // now get the complete data, passing the namelist to -import excel-
                      // (this is same as show earlier, but automated)
                      import excel `names' using <foo.xlsx> ,cellra(A3) clear
                      
                      // done
                      // no manipulation of the Excel file necessary
                      This will fail if the first two cells are merged, but one could think of a solution for this as well.

                      Best
                      Daniel[/QUO

                      Comment

                      Working...
                      X