Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Not quite. To understand why, notice that the local macro bottom_fifth is being defined in the previous line, where it takes the value of r(p5). Where is the value of r(p5) coming from? It is coming from the previous line, which has the summ command, which returns a bunch of scalars, including this one. For instance, with your example dataset, if I do

    Code:
    . summ L if year == 2010, detail
    
                                  L
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            1              1
     5%            1              3
    10%            1              9       Obs                   4
    25%            2             12       Sum of wgt.           4
    
    50%            6                      Mean               6.25
                            Largest       Std. dev.      5.123475
    75%         10.5              1
    90%           12              3       Variance          26.25
    95%           12              9       Skewness        .091223
    99%           12             12       Kurtosis       1.303915
    it returns the following results:
    Code:
    . return list
    
    scalars:
                      r(N) =  4
                  r(sum_w) =  4
                   r(mean) =  6.25
                    r(Var) =  26.25
                     r(sd) =  5.123475382979799
               r(skewness) =  .091222998923078
               r(kurtosis) =  1.303915343915344
                    r(sum) =  25
                    r(min) =  1
                    r(max) =  12
                     r(p1) =  1
                     r(p5) =  1
                    r(p10) =  1
                    r(p25) =  2
                    r(p50) =  6
                    r(p75) =  10.5
                    r(p90) =  12
                    r(p95) =  12
                    r(p99) =  12
    As you can see, this includes r(p5). So the code as it stands, will give you the fifth percentile of L in that year. If you want the fifth percentile of wage, you need to change the summ command so it is giving you the statistics for that.

    Comment


    • #17
      But then is there any other methods or commands that can generate output table by each year for (e.g. for 2021)
      Code:
      summ $L if wage <= r(p5) & ${yr} == 2021, d
      have them each year and export them into excel file?
      Cause I need to generate output table of the L distribution that is below 5p of wage each year...

      ++++ and sorry to ask you repeated questions, is there command that can generate summary statistics of variables by percentile and export into excel file?

      Comment


      • #18
        Perhaps my previous post was not clear. To get the summary statistics for L for those observations which are in the bottom 5% of the wages in each year, you would need to change the top part of my code in #12 as follows:

        Code:
        collect clear
        levelsof year, local(years)
        foreach y of local years {
            summ wage if year == `y', detail
            local bottom_fifth = r(p5)
            dtable if year == `y' &  wage <= `bottom_fifth', ///
                sample( , stat(freq)) ///
                cont(L, stat(mean p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
                name(yr_`y')
            collect addtag Years[`y']
            local collections `collections' yr_`y'
            }
        The code below this in #12 remains the same. The last line exports the table to an Excel file.

        is there command that can generate summary statistics of variables by percentile and export into excel file?
        Can you explain precisely what is the table you want? How would it compare to the table I generated in #12, for example? The dtable command I used in #12 is quite flexible; we should be able to do what you need with that, and the collect suit of commands..

        Comment


        • #19
          Hemanshu Kumar
          Thanks a lot, it's working well!
          And for the additional question, I wanted to check sum stat of variables in the dataset by quantile each year (but no such thing as above/below certain level - just a general sum stat), so I used:
          Code:
          local varlist var1 var2 var3 var4 var5 ... (so on)
          collect clear
          levelsof $yr, local(years)
          foreach y of local years {
              foreach var of local varlist {
              summ `varlist' if $yr == `y', detail
              dtable if year == `y', ///
                  sample( , stat(freq)) ///
                  cont(${vars_to_check}, stat(mean p1 p5 p10 p25 p50 p75 p90 p95 p99)) ///
                  name(yr_`y')
              collect addtag Years[`y']
              local collections `collections' yr_`y'
              }
          }
          
          collect combine comb = `collections'
          collect label levels result frequency "N", modify
          foreach n of numlist 1 5 10 25 50 75 90 95 99 {
              collect remap result = percentile, fortags(result[p`n'])
              collect label levels percentile p`n' "`n'%", modify
          }
          collect label dim percentile "Percentiles", modify
          collect style header percentile, title(label)
          collect style header result, title(hide) level(label)
          collect style cell result[frequency] percentile, nformat(%3.0f)
          collect style cell result[mean], nformat(%5.2f)
          collect style cell percentile#cell_type[column-header], border(bottom, color(black))
          collect layout (Years) (result[frequency mean] percentile[p1 p5 p10 p25 p50 p75 p90 p95 p99])
          collect title "Distribution of variables of interest in each year"
          collect export "sum_stat.xlsx" sheet("`var'") sheetreplace, replace
          and red colors for the modifications. But it seems not working (error: collection yr_2000 already exists and is not empty).
          And maybe the commands that I used above is not really what i intend to do..

          Comment

          Working...
          X