Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summary statistics by group

    I am trying to get summary statistics for my data by group. So I want statistics on number of observations, the mean and standard deviation by the following groups; tall, not tall, obese, not obese. I have been able to do this by clicking statistics>summaries tables and tests> summary and descriptive stats> summary stats and then using by: tall, not tall, obese, not obese. I know you can do this using a code I just find it easier this way. The problem I have is that I get 4 separate tables and I would like my results in one table. Is this a problem for stata? or is it just a matter of combining the tables when I copy them onto Microsoft word somehow?

    Thanks in advance for any help
    Jordan

  • #2
    say your obesity variable is called "status." Then a quick and easy way to get what you want, for a single variable, is:
    Code:
    tab status, sum(othervar)

    Comment


    • #3
      your data setup is completely unclear; let me guess, however, that height and weight are two (or more) different variables; use egen with the group function to make them into one variable and then use tab (as in #2) or tabstat or table, etc. to get the statistics you want in one table

      Comment


      • #4
        Hi,

        I have a very similar question. I wish to view the detailed descriptive statistics (in particular, averages of each percentile) of each group in my dataset.

        The code provided in this thread (tab status, sum( othervar )) provides me with the mean and SD of each group, but I am unable to get the detailed descriptive statistics for each group. Any suggestions on how to do this?

        Best regards,
        Oskar


        Comment


        • #5
          you don't say anything about how many groups you have or in what type of variable or how many variables you want descriptive statistics for; given that, I suggest
          Code:
          help levelsof
          and look at the examples, especially the examples using -foreach-

          Comment


          • #6
            There are 29 groups (Countries). The variable of interest is return on invested capital. I want detailed descriptive statistics only for the return on invested capital.

            I will take a look at levelsof and foreach.

            Comment


            • #7
              One way to do it is:
              Code:
              capture program drop my_summarize
              program define my_summarize
                  local statistics N mean sd min max p1 p5 p10 p25 p50 p75 p90 p95 p99
                  summ return_on_invested_capital, detail
                  foreach s of local statistics {
                      gen `s' = r(`s') in 1
                  }
                  keep in 1
                  keep country `statistics'
                  exit
              end
              
              runby my_summarize, by(country) verbose
              
              browse
              You will need to install Robert Picard and my -runby- from SSC to use this.

              Comment


              • #8
                Wow. Thank you so much Clyde, that was exactly what I was looking for. The world needs more people like you.

                Comment


                • #9
                  See also tabstat

                  Comment


                  • #10
                    Dear Professor Schechter

                    I, too, find your program below very useful.

                    Can I please ask if it is possible to generalise it for more than 1 variable? That is, in the slightly modified code below, if we can include variable_2, etc. in the program itself?

                    Thanks.

                    Originally posted by Clyde Schechter View Post
                    One way to do it is:
                    Code:
                    capture program drop my_summarize
                    program define my_summarize
                    local statistics N mean sd min max p1 p5 p10 p25 p50 p75 p90 p95 p99
                    summ variable_1, detail
                    foreach s of local statistics {
                    gen `s' = r(`s') in 1
                    }
                    keep in 1
                    keep group `statistics'
                    exit
                    end
                    
                    runby my_summarize, by(country) verbose
                    
                    browse
                    You will need to install Robert Picard and my -runby- from SSC to use this.

                    Comment


                    • #11
                      #10 collapse already offers that functionality.

                      Comment


                      • #12
                        Oh I see. Thank you Nick!

                        Comment


                        • #13
                          Dear all,
                          I needed (and managed with your help in these posts) to compute the variance for the first 12 observations by id in my panel. I get the variances in a table (Nick's solution) or in a new database file (Clyde's solution).
                          Now, I would like to use these per-id variances as "first observation" of a new variable in the original panel dataset (in long form):
                          -The first 11 observations per panel should be missing,
                          - the 12th, the corresponding variance for every id
                          - from the 13th observation in every panel, the new variable should be a formula. To be more precise:
                          newvar= (1-lambda)*var1[_n-1]+lambda*newvar[_n-1]
                          How could I construct this?

                          Note: what I actually need is to create two variables with a value for volatility in ewma form and in GARCH form, for every period (not the series filtered with tssmooth exponential or a GARCH model for a single series). I wanted to reproduce the equations explained in this link, which are:
                          ewma_variance=(1-lambda)*squaredreturns[_n-1]+lambda*ewma_variance[_n-1]
                          garch_variance=omega+alpha*squaredreturns[_n-1]+lambda*garch_variance[_n-1]
                          .
                          If Stata has an automatic way to do it, I would be so thankful to know it too.

                          Thanks in advance for your help.
                          Last edited by Marta ArespaC; 29 Nov 2021, 03:14.

                          Comment


                          • #14
                            Dear all,
                            I have a panel Dataset with T=2 for which I want to display simply the means of the control variables by treatment group (1 or 0) and the pvalue for the difference in means. Is there is simple command which allows me to do this?

                            Thankyou and best wishes,

                            Louis

                            Comment

                            Working...
                            X