Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • table: Summary stats - Adding range, effect size, and p value

    I'm trying to build a table using the new table command. I want the first column of the table to contain the variable range, e.g, "1-4" or "1-7". This needs to the theoretical maximum, so even if the highest actual score is 6.8, it should be 1-7. I call this Step 1.

    I want the intermediate columns to be the count mean and sd by gender. I've got this working perfectly (Step 2).

    And then at the end of the table I need the effect size and p values pertain to the difference between male and female levels. I'm doing this manually as well (Steps 3 and 4). Is there a way to use table so that steps 1, 3 and 4 are done by the command?


    Code:
    * 1. manually get ranges and put in first column
    
    * 2. get count mean sd
    table (var) (female), ///    
        statistic(count a b c)  /// 
        statistic (mean a b c) ///  
        statistic(sd a b c) /// 
        nformat(%4.0f count) nformat(%4.2f mean sd) nototals
    collect levelsof result
    collect style header result row stack, level(hide)
    collect layout (var) (female[0 1]#result)
    
    *3. get effect sizes as eta squared
    *get eta squares
    foreach var in a b c{
    quietly anova `var' female
    qui estat esize   
    di e(r2) 
    }
    
    *4. get p values
    foreach var in a b c{
    quietly anova `var' female
    local pModel = Ftail(e(df_m),e(df_r),e(F))
    display `pModel'
    }
    Thanks.



  • #2
    Here is an example using simulated data inspired by your original code:

    Code:
    * simulate some data
    set seed 17
    set obs 1000
    gen female = runiformint(0,1)
    label define female 0 "Male" 1 "Female"
    label values female female
    gen a = female + runiformint(1,7) + runiform() if runiform() > .05
    gen b = female/6 + runiformint(1,4) + runiform() if runiform() > .05
    gen c = runiformint(2,6) - runiform() if runiform() > .05
    
    * compute and collect ranges
    collect create ranges
    foreach var in a b c {
        summarize `var'
        local min = floor(r(min))
        local max = ceil(r(max))
        collect get range="`min' - `max'", tag(var[`var'])
    }
    collect label levels result range "Range", modify
    collect layout (var) (result)
    
    * compute and collect statistics
    table (var) (female result), ///
        name(stats) ///
        statistic(count a b c) ///
        statistic(mean a b c) ///
        statistic(sd a b c) ///
        nformat(%4.0f min max) ///
        nformat(%4.0f count) ///
        nformat(%4.2f mean sd) ///
        nototals
    collect label levels result count "N" sd "SD", modify
    collect preview
    
    * compute and collect effects
    collect create fits
    foreach var in a b c {
        collect e(r2) ///
            p = Ftail(e(df_m),e(df_r),e(F)) ///
            , ///
            tag(var[`var']) ///
            : anova `var' female
    }
    collect style cell result[r2], nformat(%4.2f)
    collect style cell result[p], nformat(%6.3f)
    collect label levels result ///
        p "p-value" ///
        r2 "Effect size" ///
        , modify
    collect layout (var) (result)
    
    * combine, arrange, and customize
    collect combine full = ranges stats fits
    collect layout (var) (result[range] female#result result[r2 p])
    collect style header female, title(hide)
    collect preview
    Here is the resulting table
    Code:
    . collect preview
    
    --------------------------------------------------------------------------
      |  Range          Male               Female        Effect size   p-value
      |            N   Mean     SD     N   Mean     SD                        
    --+-----------------------------------------------------------------------
    a |  1 - 9   469   4.59   2.02   477   5.53   2.00          0.05     0.000
    b |  1 - 6   472   3.02   1.14   472   3.20   1.16          0.01     0.016
    c |  1 - 6   479   3.58   1.47   477   3.60   1.47          0.00     0.811
    --------------------------------------------------------------------------

    Comment


    • #3
      Thanks! This works great. I added collect clear at the top to make it reusable.

      On my machine, this line does not produce an error but does not have any effect either:

      Code:
       
       collect style header female, title(hide)

      Comment


      • #4
        I have a follow-up question. I have a factor variable called racea, which hold race. I am trying to create a table in which just a subset of values of racea is to be used. I have tried usign the format listed in the help file but regardless of where I put, say, race[1 3]. I get all races.

        Comment


        • #5
          Regarding #3 above, that line was part of my experimenting to get the final table. I did not realize it was not necessary when I posted.

          Hiding the dimension titles is already in the default style for collections ranges and fits, and is carried forward in the combined style when collection full is created.

          Comment


          • #6
            Did you specify race[1 3] or racea[1 3]?
            collect does not support variable abbreviations.

            Comment


            • #7
              It turns out that was was the problem. Thanks for spotting it!

              Comment


              • #8
                Regarding #5, what would I have to do here to get the dimension title to show, assuming I wanted that? I tried a couple of things, but neither worked.


                Code:
                * end: combine, arrange, and customize
                collect combine full = ranges stats fits
                collect layout (var) (result[range] racea[1 3]#result result[r2 p])
                collect style header racea, title(???)
                collect preview

                Comment


                • #9
                  Originally posted by Chris Martin View Post
                  Regarding #5, what would I have to do here to get the dimension title to show, assuming I wanted that? I tried a couple of things, but neither worked.


                  Code:
                  * end: combine, arrange, and customize
                  collect combine full = ranges stats fits
                  collect layout (var) (result[range] racea[1 3]#result result[r2 p])
                  collect style header racea, title(???)
                  collect preview
                  @Chris Martin If you want to show the dimension title, you can use the codes below:
                  Code:
                  collect combine full = ranges stats fits
                  collect layout (var) (result[range] female#result result[r2 p])
                  collect style header female, title(name)
                  collect preview
                  
                  --------------------------------------------------------------------------
                    |  Range                   female                  Effect size   p-value
                    |                 Male               Female                             
                    |            N   Mean     SD     N   Mean     SD                        
                  --+-----------------------------------------------------------------------
                  a |  1 - 9   469   4.59   2.02   477   5.53   2.00          0.05     0.000
                  b |  1 - 6   472   3.02   1.14   472   3.20   1.16          0.01     0.016
                  c |  1 - 6   479   3.58   1.47   477   3.60   1.47          0.00     0.811
                  --------------------------------------------------------------------------

                  Best
                  Raymond
                  Best regards.

                  Raymond Zhang
                  Stata 17.0,MP

                  Comment


                  • #10
                    Maybe you can make it more beautiful.You can add a line below Male and Female.
                    Code:
                    . collect style cell cell_type[column-header]#female,border(bottom,pattern(single))
                    . collect preview
                    
                    --------------------------------------------------------------------------
                      |  Range          Male               Female        Effect size   p-value
                      |         ---------------------------------------                       
                      |            N   Mean     SD     N   Mean     SD                        
                    --+-----------------------------------------------------------------------
                    a |  1 - 9   469   4.59   2.02   477   5.53   2.00          0.05     0.000
                    b |  1 - 6   472   3.02   1.14   472   3.20   1.16          0.01     0.016
                    c |  1 - 6   479   3.58   1.47   477   3.60   1.47          0.00     0.811
                    --------------------------------------------------------------------------
                    Best regards.

                    Raymond Zhang
                    Stata 17.0,MP

                    Comment


                    • #11
                      Thanks. Sorry for the rather late reply but is there a way to insert a break in the line between male and female, i.e., have a thin column there with no horizontal line?

                      Comment


                      • #12
                        Hello, I want to add a p-value column to this table where the p-value from relevant tests (i.e., for 'age' from t-test and for 'gender' and 'course' from chi-square test) will be displayed. What should be my code?

                        Code:
                        -----------------------------------------
                                           Substance Use        
                                        No               Yes    
                        -----------------------------------------
                        Age       22.1     (1.9)   23.6     (2.1)
                                                                
                        Gender                                  
                          Male     413   (42.4%)    126   (81.8%)
                          Female   562   (57.6%)     28   (18.2%)
                        Course                                  
                          MBBS     922   (94.6%)    146   (94.8%)
                          BDS       53    (5.4%)      8    (5.2%)
                        -----------------------------------------
                        The code I used was:
                        Code:
                        table (var) (substanceuse), statistic(mean age) statist (sd age) statistic(fvfrequency gender curriculum ) statistic(fvpercent gender curriculum ) nototals
                        collect recode result mean = column1 sd=column2 fvfrequency = column1 fvpercent   = column2
                        collect layout (var) (substanceuse#result[column1 column2])
                        collect style cell var[gender curriculum ]#result[column1], nformat(%6.0fc)
                        collect style cell var[gender curriculum ]#result[column2], nformat(%6.1f) sformat("(%s%%)")
                        collect style cell var[age]#result[column1 column2], nformat(%6.1f)
                        collect style cell var[age]#result[column2], sformat("(%s)")
                        collect style header result, level(hide)
                        collect style row stack, nobinder spacer
                        collect style cell border_block, border(right, pattern(nil))
                        collect preview
                        Thank you!

                        Stata 17

                        Comment

                        Working...
                        X