Dear All,
I am trying to optimize performance of my code that generates the output.
The simplified example of what I am having is the following code:
The first (gray) part of the code is data generation, which substitutes for real report-producing code. I am not sure how the content here might affect performance, but I think for illustration purposes it's fine as is here.
After an Excel file is produced with the -putexcel- statements, I reformat it using a Mata procedure formatcols, since this is the only way I found to adjust the columns widths (or is there a better way?).
I am concerned about the performance from this point onward. In the example it takes some 11 seconds per sheet to format the 10 columns. In the real project it takes well above 2 minutes per sheet. I can't imagine what is going on. For comparison, it takes a fraction of a second to extract all sheets from the XLSX file (unpack).
The only thing I've noticed is that the Excel file in the real report contains massive (above 44MB) "styles.xml" file which is repeating the same style definition over and over again. It is easily compressed though, so doesn't bloat the *.xlsx file much.
Is there a faster way to set the column widths in an Excel file from Stata?
Thank you, Sergiy Radyakin
I am trying to optimize performance of my code that generates the output.
The simplified example of what I am having is the following code:
Code:
clear all
version 17.0
set rmsg on
mata
void setcols() {
class xl scalar b
b = xl()
b.load_book(st_local("file"))
colwidths = tokens(st_local("colwidths"))
sheets = b.get_sheets()
for(i=1;i<=rows(sheets);i++) {
b.set_sheet(sheets[i])
st_global("c(current_time)") + " " + sheets[i]
for(j=1; j<=cols(colwidths); j++) {
b.set_column_width(j,j,strtoreal(colwidths[j]))
}
}
b.close_book()
}
end
program define formatcols
version 17.0
syntax , file(string) colwidths(string)
mata setcols()
end
local f "C:/temp/benchmark.xlsx"
capture erase "`f'"
forval s=1/20 {
putexcel set "`f'", open modify sheet("s`s'")
local cols="ABCDEFGHIJKLMNOPQRSTUVWZXYZ"
forval i=1/200 {
forval j=1/`=strlen("`cols'")' {
putexcel `=substr(`"`cols'"',`j',1)'`i' = "Test value"
}
}
putexcel save
}
formatcols, file("`f'") colwidths("3.82 31.27 37.55 30.91 15.64 18.36 26.36 40.91 65.18 35.82")
// END OF FILE
The first (gray) part of the code is data generation, which substitutes for real report-producing code. I am not sure how the content here might affect performance, but I think for illustration purposes it's fine as is here.
After an Excel file is produced with the -putexcel- statements, I reformat it using a Mata procedure formatcols, since this is the only way I found to adjust the columns widths (or is there a better way?).
I am concerned about the performance from this point onward. In the example it takes some 11 seconds per sheet to format the 10 columns. In the real project it takes well above 2 minutes per sheet. I can't imagine what is going on. For comparison, it takes a fraction of a second to extract all sheets from the XLSX file (unpack).
Code:
. formatcols, file("`f'") colwidths("3.82 31.27 37.55 30.91 15.64 18.36 26.36 40.91 65.18 35.82") 12:32:52 s1 12:33:03 s2 12:33:14 s3 12:33:25 s4 12:33:35 s5 12:33:46 s6 12:33:57 s7 12:34:07 s8 12:34:18 s9 12:34:29 s10 12:34:40 s11 12:34:51 s12 12:35:01 s13 12:35:12 s14 12:35:23 s15 12:35:34 s16 12:35:44 s17 12:35:55 s18 12:36:06 s19 12:36:17 s20 r; t=216.41 12:36:27
Is there a faster way to set the column widths in an Excel file from Stata?
Thank you, Sergiy Radyakin
Comment