Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding Multiple Summary stat

    Hello,

    I would like to get summary statistics (mean, standard deviation etc) for Var1 based on the values of Var2.
    My data look like as follow:

    ID Var1 Var2
    1 10 5
    2 20 10
    3 70 15
    4 100 11
    5 95 4
    .....
    I want to get summary statistics of Var1 separately for the following values of Var2: if "Var2 <5", "if Var2 >=5 and Var2< 10", if "Var2 > 11" etc. This means get summary statistics of Var1 based on the values of Var2.
    sum var1 if var2< 5 gives statistics and can be done repeatedly for other conditions, which is very time consuming as I need to do it for many conditions of Var2.
    Is there any way to summarize Var1 when above-mentioned conditions met for Var2?

    Thanks,
    Krishna

  • #2
    You could do something along these lines
    Code:
    generate int increment = 0
    summarize Var2, meanonly
    assert !missing(Var2)
    forvalues i = 5(5)`r(max)' {
        quietly replace increment = increment + 1 if Var2 >= `i'
    }
    
    table increment, contents(mean Var1 sd Var1 n Var1) format(%02.0f)
    It assumes that you want summary statistics for Var1 in increments of five for Var2, that is, [0, 5), [5, 10), [10, 15) . . . You can change the loop increment value as needed. You can also get fancy and create a set of value labels for the increment variable if you want.

    Comment


    • #3
      Here are two solutions, the first uses rangestat (from SSC) and will calculate statistics per observation using other observations with values for Var2 that are under or greater than the value of Var2 for the current observation. You can also loop over all values of Var2.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(D Var1 Var2)
      1  10  5
      2  20 10
      3  70 15
      4 100 11
      5  95  4
      end
      
      * identify a value that is less than the min for Var2 and the max for Var2
      sum Var2
      gen low = r(min) - 1
      gen high = r(max)
      
      * decriptive stats of all Var1 values where Var2 < Var2[_n]
      rangestat (count) Var1 (mean) Var1 (sd) Var1 (min) Var1 (max) Var1, interval(Var2 low -1)
      rename Var1_* =_low
      
      * decriptive stats of all Var1 values where Var2 >= Var2[_n]
      rangestat (count) Var1 (mean) Var1 (sd) Var1 (min) Var1 (max) Var1, interval(Var2 0 high)
      
      * another approach is to loop over all values of Var2
      levelsof Var2, local(values)
      foreach x of local values {
          dis "stats for Var2 == " as res `x'
          sum Var1 if Var2 < `x'
          sum Var1 if Var2 >= `x'
      }

      Comment


      • #4
        Thank you for the both replies.

        I tried both codes (From Joseph and Robert ), but did not work in a way that I wanted.
        For example, I used the code suggested by Joseph,
        It reports different results than when I did separately e.g.
        sum var1 if "Var2 <5",
        sum var1 if Var2 >=5 and Var2< 15" etc. , which I guess the loop did not work properly.
        Since I want to summarize Var1 on unequal intervals of Var2 values (e.g < 5, >=5 to < 15 and >=15 to <100) as equal interval does not give enough observations.

        Thanks,

        Comment


        • #5
          Krishna: The thread looks confusing to me because in #1 there is much stress on the "many conditions" for var2 that you want to specify. Accordingly Joseph and Robert showed you some systematic techniques.

          In #4 you seem to come back to 3 particular groupings but also keep stressing that they these are examples.

          You can't have it both ways: you can specify exactly what you want, or you can ask for general technique. No one can give you precise code for examples you don't specify.

          Perhaps you are getting confused on what basic abbreviations mean:

          e.g. means for example and implies that there are other cases.

          etc. implies also that there are other examples.

          If all you want, at least to start off with, is a three-fold classification:

          Code:
          if var2 < 5 
          
          if var2 >= 5 & var2 < 15 
          
          if var2 >= 15
          then you just need to add those conditions to your commands. Otherwise you'll need to start following FAQ Advice and give a precise data example, precise code that you used, and explain what you need that they don't supply.

          Please do read http://www.statalist.org/forums/help

          Comment


          • #6
            Thanks Nick.

            Comment

            Working...
            X