Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -putexcel- efficiency and per-cell conditional formatting

    Is there a fast way to conditionally set per-cell formatting with -putexcel- in Stata 14? I have a way, but it's slow with my real data.

    If I want cell A1 to contain "Green" and have a solid green fill, and I want cell A2 to contain "Red" and have a solid red fill, I can do this in Stata 14:

    Code:
    putexcel A1="Green", fpat(solid, green)
    putexcel A2="Red", fpat(solid, red)
    But that doesn't scale well to lots of cells, presumably because -putexcel- is writing to disk once per line.

    Is there any access to Excel's conditional formatting tools via -putexcel-?

    Or... is there a way to use multiple fpat() statements per -putexcel- command? It seems like I should be able to code something like:

    Code:
    putexcel A1="Green", fpat(solid, green) A2="Red", fpat(solid, red)
    but that doesn't work.

  • #2
    I would be surprised if you are able to access Excel’s conditional formatting tools through the -putexcel- command. As you suggest yourself, it doesn’t even seem possible to specify more than one formatting option per -putexcel- command, so I can’t help you on that one either.

    However, I’m assuming that your criteria for the color format is determined somewhere in your Stata script (outputted p-values, coefficients, etc.). The only scalable solution I can think of would be to loop through the -putexcel- commands one by one. You could use the relevant formatting criteria to have a macro insert the correct format options for each iteration (maybe employ a matrix as lookup table is needed).

    // Mathias

    Comment


    • #3
      Originally posted by Mathias Pedersen Heinze View Post
      ...
      The only scalable solution I can think of would be to loop through the -putexcel- commands one by one. You could use the relevant formatting criteria to have a macro insert the correct format options for each iteration (maybe employ a matrix as lookup table is needed).

      // Mathias
      Yep, that's how I solved it — pretty much just brute forced it with a series of looped -putexcel- commands. It is slow, since it's writing to disk a lot.

      More detail on the data: I have a survey that was administered in each year 2010-2015 (different respondents, not panel data). Not all items were asked in every year, however, and some items changed. I needed a way to visualize which variables contained any valid data in each year. Let's say I have a variable named v1 that's included in 2010, 2012, 2013, and 2015 but was not included in 2011 and 2014. What I wanted was something like this in Excel:

      Variable Label 2010 2011 2012 2013 2014 2015
      v1 The first var x . x x . x

      but with color-coded cells based on whether any valid observations exist for that year instead of symbols (x for present, . for not present in my example).

      With that example data, in Stata 14.1 I could use:

      Code:
      clear
      input byte v1 int year
      1 2010
      . 2011
      1 2012
      1 2013
      . 2014
      1 2015
      end
      label var v1 "The first var"
      
      putexcel set test.xlsx
      putexcel a2="Variable" b2="Label" c2=2010 d2=2011 e2=2012 f2=2013 g2=2014 h2=2015
          forval x = 2010/2015 {
              qui tab v1 if year == `x'
              if `r(r)' == 0 {
                  local present`x' "."
                  local fmtcolor`x' "fpat(solid, red)"
                  }
              else {
                  local present`x' "x"
                  local fmtcolor`x' "fpat(solid,green)"
              }
          }
      local varlab: var l v1
      putexcel A3="v1" B3="`varlab'"
      putexcel c3="`present2010'" , `fmtcolor2010'
      putexcel d3="`present2011'" , `fmtcolor2011'
      putexcel e3="`present2012'" , `fmtcolor2012' 
      putexcel f3="`present2013'" , `fmtcolor2013' 
      putexcel g3="`present2014'" , `fmtcolor2014' 
      putexcel h3="`present2015'", `fmtcolor2015'
      I've nested that within a -foreach- that repeats it for each variable in the dataset, but with 850 variables, that's 7 cols * 6 years * 850 vars = ~35,700 writes to disk. It's not fast.

      If I did not color-code the cells, I could combine those 7 -putexcel- commands at the end into just one line and speed it up a lot. But the result is less readable without the color. I could set conditional formatting manually or with a script in Excel, but I want a fully-replicatable solution in Stata. Works-but-slow is better than fast given my requirements.

      Since I've solved the problem, my question was to make sure I wasn't missing something in the -putexcel- command that would speed this up. I love that this is even possible in Stata, so I'm not complaining. But I'd love it even more if the -putexcel- format syntax was more like -twoway- and allowed multiple format statements per line.

      Comment


      • #4
        it's a lot faster to write to .xls files than .xlsx files.

        Comment

        Working...
        X