Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • import excel for few defined rows up to the end of spreadsheet for columns

    Hi,

    I am trying to import an excel spreadsheet for few defined rows up to the end of spreadsheet for columns (number of columns change of time).

    For instance, I would like to import from cell A7 and the next 4 rows (B7, C7, D7, E7) along with all columns but not the rest of the spreadsheet. I am currently using the option "cellrange()" but I do not know the proper syntax:

    Code:
    * Not what I want because it imports the entire spreadsheet starting from A7
    import excel "0.test.xlsx", clear firstrow sheet("Beta_Query") cellrange(A7)
    Can someone help? Thanks.

  • #2
    Francois, I think you are halfway there. You only specified the starting cell in cellrange, for your purposes you need to specify an end also, namely the last right lower corner cell from your excel sheet, i.e. E and the number of your last column.
    I have attached some sample code and refer you to the helpfile https://www.stata.com/manuals13/dimportexcel.pdf.
    Code:
    use "http://www.stata-press.com/data/r13/auto", clear
    export excel auto, firstrow(variables) replace
    import excel auto.xls, cellrange(A7:E75) firstrow clear
    I hope this is helpful.

    Comment


    • #3
      One approach is
      Code:
      import excel "0.test.xlsx", clear firstrow sheet("Beta_Query") cellrange(A7)
      keep in 1/4
      The problem with using import excel when you don't know how many columns (or rows) are in the workbook is that if you ask for a column beyond what Excel considers its rightmost column, you get an error. If this happens for you, here's an approach. Suppose you know that you never have more than 10 (J) columns. You can use putexcel to place something into column 11 (K), so that Excel now thinks its rightmost column is K.
      Code:
      . putexcel set "~/Downloads/0.test.xlsx", sheet("Beta_Query") modify
      
      . putexcel K7 = "dummy"
      file /Users/lisowskiw/Downloads/0.test.xlsx saved
      
      . putexcel save
      
      . putexcel clear
      
      . import excel "~/Downloads/0.test.xlsx", clear firstrow sheet("Beta_Query") cellrange(A7:J11)
      (10 vars, 4 obs)
      
      . list
      
           +-------------------------------------------+
           | dog   cat   C   D   E   F   G   H   I   J |
           |-------------------------------------------|
        1. |   8     4   .   .   .   .   .   .   .   . |
        2. |   2     1   .   .   .   .   .   .   .   . |
        3. |   6     6   .   .   .   .   .   .   .   . |
        4. |   1    10   .   .   .   .   .   .   .   . |
           +-------------------------------------------+

      Comment


      • #4
        It does the trick, thanks a lot William!

        Comment


        • #5
          Originally posted by William Lisowski View Post
          One approach is
          The problem with using import excel when you don't know how many columns (or rows) are in the workbook is that if you ask for a column beyond what Excel considers its rightmost column, you get an error. If this happens for you, here's an approach. Suppose you know that you never have more than 10 (J) columns. You can use putexcel to place something into column 11 (K), so that Excel now thinks its rightmost column is K.
          This is a nice trick, William. Coincidentally, do you know how Stata determines the dimensions of the table to import when a cell range is not specified?

          Comment


          • #6
            I'm glad you liked that trick. It was born out of annoyance that import excel does not include an suboption to cellrange() that allows you to tell Excel you'll take whatever you can get up to the limits defined, sort of like
            Code:
            import excel "~/Downloads/0.test.xlsx", clear firstrow sheet("Beta_Query") cellrange(A7:J11, or_less)
            I believe that import excel relies on metadata stored by Excel as part of the workbook that gives the identity of the "lower right" cell in the worksheet.

            However, Excel's idea of the lower right corner includes cell characteristics other than "is there a value in the cell" that are irrelevant to Stata's interest in "what are the outer bounds of the data". I just created an empty worksheet and put yellow fill into cell E3 - without having entered any data into any cell - and then went digging through the XML description of the worksheet and found that Excel identified E3 as the lower right cell. Using import excel without a cellrange read one variable - named E - with one observation - a missing value.

            In Excel's defense, E3 contained information. It just wasn't information that Stata cared about, but I can imagine a dashboard style worksheet where the color of a particular cell conveyed important information.

            Bottom line is Stata does (almost) the best it can with what it has to work with.

            Comment


            • #7
              Thanks, William. That's very helpful information about some of the dynamics between Excel's content storage and Stata's import thereof.

              Comment

              Working...
              X