Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exporting summary statistics into excel file

    Hello all,

    I am trying to first generate summary statistics of some variables (by percentile) for each year and then export the output table into excel file.
    (Just for the quick reference, this post is an extension to Generate table to describe distribution of a variable that is below certain percentile by year - Statalist).
    I am just making up the sample of my dataset (firm-level panel data) which looks like:
    ID Year VA GO Wage Labour cost L capital
    1 2000 98 999 829 9280 09 09
    1 2001 279 897 097 7809 89 79
    1 2002 098 087 76 8768 87 76
    1 2003 87 76 756 9887 987 987
    1 2004 987 976 7687 76 4 456
    2 2019 675 765 675 57 465 654
    2 2020 654 876 876876 654 654 65
    3 2013 876 876 876 876 876 876
    3 2014 987 987 987 987 987 987
    I have tried the below code:
    Code:
     
     local varlist var1 var2 var3 var4 var5 ... (so on) collect clear levelsof $yr, local(years) foreach y of local years {     foreach var of local varlist {     summ `varlist' if $yr == `y', detail     dtable if year == `y', ///         sample( , stat(freq)) ///         cont(${vars_to_check}, stat(mean p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///         name(yr_`y')     collect addtag Years[`y']     local collections `collections' yr_`y'     } }  collect combine comb = `collections' collect label levels result frequency "N", modify foreach n of numlist 1 5 10 25 50 75 90 95 99 {     collect remap result = percentile, fortags(result[p`n'])     collect label levels percentile p`n' "`n'%", modify } collect label dim percentile "Percentiles", modify collect style header percentile, title(label) collect style header result, title(hide) level(label) collect style cell result[frequency] percentile, nformat(%3.0f) collect style cell result[mean], nformat(%5.2f) collect style cell percentile#cell_type[column-header], border(bottom, color(black)) collect layout (Years) (result[frequency mean] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99]) collect title "Distribution of variables of interest in each year" collect export "sum_stat.xlsx" sheet("`var'") sheetreplace, replace
    which is not working well..
    Could someone help me solving this problem please?

    Thanks a lot in advance!

  • #2
    Originally posted by Anne-Claire Jo View Post
    Code:
    collect clear levelsof $yr, local(years) foreach y of local years { foreach var of local varlist { summ `varlist' if $yr == `y', detail dtable if year == `y', /// sample( , stat(freq)) /// cont(${vars_to_check}, stat(mean p1 p5 p10 p25 p50 p75 p90 p95 p99)) /// name(yr_`y') collect addtag Years[`y'] local collections `collections' yr_`y' } } collect combine comb = `collections' collect label levels result frequency "N", modify foreach n of numlist 1 5 10 25 50 75 90 95 99 { collect remap result = percentile, fortags(result[p`n']) collect label levels percentile p`n' "`n'%", modify } collect label dim percentile "Percentiles", modify collect style header percentile, title(label) collect style header result, title(hide) level(label) collect style cell result[frequency] percentile, nformat(%3.0f) collect style cell result[mean], nformat(%5.2f) collect style cell percentile#cell_type[column-header], border(bottom, color(black)) collect layout (Years) (result[frequency mean] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99]) collect title "Distribution of variables of interest in each year" collect export "sum_stat.xlsx" sheet("`var'") sheetreplace, replace
    Prior to #1, code that I used is:
    Code:
    collect clear
    levelsof $yr, local(years)
    foreach y of local years {
        foreach var of global vars_to_check {
        summ ${vars_to_check} if $yr == `y', detail
        dtable if year == `y', ///
            sample( , stat(freq)) ///
            cont(`var', stat(mean p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
            name(yr_`var'_`y')
        collect addtag Years[`y']
        local collections `collections' yr_`var'_`y'
        }
    
    
    collect combine comb = `collections'
    collect label levels result frequency "N", modify
    foreach n of numlist 1 5 10 25 50 75 90 95 99 {
        collect remap result = percentile, fortags(result[p`n'])
        collect label levels percentile p`n' "`n'%", modify
    }
    collect label dim percentile "Percentiles", modify
    collect style header percentile, title(label)
    collect style header result, title(hide) level(label)
    collect style cell result[frequency] percentile, nformat(%3.0f)
    collect style cell result[mean], nformat(%5.2f)
    collect style cell percentile#cell_type[column-header], border(bottom, color(black))
    collect layout (Years) (result[frequency mean] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99])
    collect title "Distribution of `var' in each year"
    collect export sum_stat_vars.xlsx sheet("`var'") sheetreplace, replace
    }

    Comment


    • #3
      I am not sure what to make of

      which is not working well...
      What error(s) are you getting? And if it is not errors, then in what way is the output not satisfactory? It might also help to show a made-up dummy output table showing exactly what you want the output to look like.

      Comment


      • #4
        It is also unclear to me what you are trying to achieve in your code. For instance, in #1 you have the following loop:

        Code:
        foreach y of local years {
            foreach var of local varlist {    
                summ `varlist' if $yr == `y', detail    
                dtable if $yr == `y', ///        
                    sample( , stat(freq)) ///        
                    cont(${vars_to_check}, stat(mean p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///        
                    name(yr_`y')    
                    
                collect addtag Years[`y']    
                local collections `collections' yr_`y'
            }
        }
        Two questions about this:
        • the inner loop iterates over each variable var in the macro varlist, but notice that the loop does not actually use `var' anywhere. So what is the point of the loop?
        • I also don't understand the point of the summ command on the next line. In the thread you linked in #1, I had used the summ command to return the 5th percentile of the variable, and the dtable was then constructed for only that subset of the data. But here you are constructing the dtable for all the data of a particular year, not just the part below the 5th percentile of a variable. So what is the purpose of the summ command here?

        Comment


        • #5
          Hello Hemanshu Kumar , thanks for your reply.

          I was thinking of having the output (in the excel file) like below (which is similar to the output table that I had from Generate table to describe distribution of a variable that is below certain percentile by year - Statalist):

          Var 1
          Year N Mean percentile percentile percentile percentile percentile percentile percentile percentile percentile
          1% 5% 10% 25% 50% 75% 90% 95% 99%
          2001 12 12 12 12 12 12 12 12 12 12 12
          2002 12 12 12 12 12 12 12 12 12 12 12
          2003 .. .. .. .. .. .. .. .. .. .. ..
          2004

          which would be the summary statistics of var1 by each year by percentile stored in sheet(Var1).
          And it same for var2-10 (in global varlist) in each sheet (sheet(var2). sheet(var3)... etc).

          To have this result, I used:
          Code:
          global varlist var*
          
          collect clear
          levelsof $yr, local(years)
          foreach y of local years {
              foreach var of global varlist {
              summ ${varlist} if $yr == `y', detail
              dtable if year == `y', ///
                  sample( , stat(freq)) ///
                  cont(`var', stat(mean p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
                  name(yr_`var'_`y')
              collect addtag Years[`y']
              local collections `collections' yr_`var'_`y'
              }
          
          
          collect combine comb = `collections'
          collect label levels result frequency "N", modify
          foreach n of numlist 1 5 10 25 50 75 90 95 99 {
              collect remap result = percentile, fortags(result[p`n'])
              collect label levels percentile p`n' "`n'%", modify
          }
          collect label dim percentile "Percentiles", modify
          collect style header percentile, title(label)
          collect style header result, title(hide) level(label)
          collect style cell result[frequency] percentile, nformat(%3.0f)
          collect style cell result[mean], nformat(%5.2f)
          collect style cell percentile#cell_type[column-header], border(bottom, color(black))
          collect layout (Years) (result[frequency mean] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99])
          collect title "Distribution of `var' in each year"
          collect export "sum_stat.xlsx" sheet("`var'") sheetreplace, replace
          }
          And from this commands, I have following error:
          Code:
          Collection: comb
                Rows: Years
             Columns: result[frequency mean] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99]
          
          Your layout specification does not uniquely match any items. One or more of the following dimensions might help uniquely match
          items: collection, colname, statcmd, var.
          option as() required; could not determine filetype from filename
          r(198);
          I hope this is clear!
          Thank you in advance

          Comment


          • #6
            Since you no longer need to subset the data by percentile, we can actually simplify the code by using the by option of dtable. I think this will do what you need:

            Code:
            global yr Year
            local varlist VA GO
            
            collect clear
            
            dtable , ///
                sample( , stat(freq)) ///
                cont(`varlist', stat(mean p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
                by($yr)
                
            collect label levels result frequency "N", modify
            foreach n of numlist 1 5 10 25 50 75 90 95 99 {
                collect remap result = percentile, fortags(result[p`n'])
                collect label levels percentile p`n' "`n'%", modify
            }
            
            collect label dim percentile "Percentiles", modify
            collect style header percentile, title(label)
            collect style header result, title(hide) level(label)
            collect style cell result[frequency] percentile, nformat(%3.0f)
            collect style cell result[mean], nformat(%5.2f)
            collect style cell percentile#cell_type[column-header], border(bottom, color(black))
            collect style header var, title(hide) level(hide)
            
            foreach v of varlist `varlist' {
                collect layout (var[`v']#$yr) (result[frequency mean] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99])
                collect title "Distribution of `v' in each year"
                collect export "sum_stat.xlsx", sheet("`v'", replace) modify
            }
            Last edited by Hemanshu Kumar; Today, 02:42.

            Comment


            • #7
              Hemanshu Kumar it's working super well! Thanks a lot

              Comment

              Working...
              X