Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collect System: Displaying categorical covariate level percentages across rows/outcome levels, rather than across column/covariate levels

    Hello,

    I have using a similar code as below to create customizable tables using the Collect system. This system has been very handy!

    Code:
    clear
    webuse nhanes2l
    tab highbp
    
    collect clear
    collect layout, clear
        
    * Obtain an initial table with necessary statistics                
    collect: table ///
        (var) ///
        (highbp), ///
            statistic(mean age) ///
                statistic(sd age) ///
                statistic(fvfrequency race) ///
                statistic(fvpercent race)
    
    * The highbp level value labeled "Total" can later be tagged with a p-value interest
    collect label list highbp, all
    
    * Calculate a p-value for association between highbp and age
    ologit highbp c.age
    * Consume that p-value and attach tags to it for the layout command later on to know where it should be placed
    collect get manual_pvalue = e(p), tags(var[age] highbp[.m])
    
    * Calculate a p-value for the association between highbp and race
    tab race highbp, chi2 nolabel // note the value of the first level "White". Will use this value for correct tagging/position placement of the Pvalue in next command
    * Consume that p-value and attach tags to it for the layout command later on to know where it should be placed
    collect get manual_pvalue = r(p), tags(var[1.race] highbp[.m])
    
    * Add a custom formatted footnote for the p-values, corresponding to the p-values obtained in table
    collect stars manual_pvalue 0.01 "***" 0.05 "**" 0.1 "*", ///
        attach(manual_pvalue) shownote
    * Update the formatting for easier readability
    collect style cell result[manual_pvalue], nformat(%5.2f) minimum(.01)
    
    * Recode the levels of the result dimension to allow combined placement of the statistic of interest later in the shared cells in the final table
    collect recode result ///
        fvfrequency = variable_measure ///
        fvpercent = variable_spread ///
        mean = variable_measure ///
        sd = variable_spread //
    
    *Add labels to the mentioned combined levels for better table readability
    collect label levels result variable_measure "Count / Mean" variable_spread "% / (SD)"  
    collect label levels result manual_pvalue "P-Value"  
    
    *Create the final table   
    collect layout ///
        (var) ///
        (highbp#result[variable_measure variable_spread] result[manual_pvalue])
    Above code results in the following table:

    Code:
    -----------------------------------------------------------------------------------------------------
                |                              High blood pressure                                P-Value
                |             0                         1                       Total                    
                |  Count / Mean   % / (SD)   Count / Mean   % / (SD)   Count / Mean   % / (SD)           
    ------------+----------------------------------------------------------------------------------------
    Age (years) |      42.16502   16.77157       54.97281   14.90897       47.57965   17.21483   <0.01***
    Race=White  |          5317   88.98745           3748   85.64899           9065   87.57608   <0.01***
    Race=Black  |           545   9.121339            541   12.36289           1086   10.49174           
    Race=Other  |           113   1.891213             87   1.988117            200    1.93218           
    -----------------------------------------------------------------------------------------------------
    *** p<.01, ** p<.05, * p<.1

    My Question:

    Currently, the resultant table above, reports the percentages of the levels of the Race predictor (i.e. White, Black, Other) across the column. In other words, it shows, for example, "what percentage among all people who do not have High Blood Pressure are White, Black, or Other" . However, I want to report percentage levels of the Race predictor across the row (e.g. "what percentage among all persons who are White do not have High Blood Pressure? And what percentage among all persons who are White have High Blood Pressure?")

    What would be the best approach to report composition percentage of the covariates across levels of the outcome, as opposed to within the levels of the covariate itself (as the code above shows)?

    Thank you so much in advance!

  • #2
    Originally posted by Nathan Yu View Post
    Currently, the resultant table above, reports the percentages of the levels of the Race predictor (i.e. White, Black, Other) across the column. In other words, it shows, for example, "what percentage among all people who do not have High Blood Pressure are White, Black, or Other" . However, I want to report percentage levels of the Race predictor across the row (e.g. "what percentage among all persons who are White do not have High Blood Pressure? And what percentage among all persons who are White have High Blood Pressure?")
    You have a lot of extra details not relevant to your question. What this boils down to is the difference between the commands highlighted in red and blue below.

    Code:
    webuse nhanes2l, clear
    table (var) (highbp), statistic(fvfrequency race) statistic(fvpercent race) nototal
    qui table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp)
    collect layout (race[1 2 3]#result) (var)
    Res.:

    Code:
    . table (var) (highbp), statistic(fvfrequency race) statistic(fvpercent race) nototal
    
    ----------------------------------------------------
                                |   High blood pressure 
                                |          0           1
    ----------------------------+-----------------------
    Race=White                  |                       
      Factor-variable frequency |      5,317       3,748
      Factor-variable percent   |      88.99       85.65
    Race=Black                  |                       
      Factor-variable frequency |        545         541
      Factor-variable percent   |       9.12       12.36
    Race=Other                  |                       
      Factor-variable frequency |        113          87
      Factor-variable percent   |       1.89        1.99
    ----------------------------------------------------
    
    . 
    . qui table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp)
    
    . 
    . collect layout (race[1 2 3]#result) (var)
    
    Collection: Table
          Rows: race[1 2 3]#result
       Columns: var
       Table 1: 10 x 2
    
    ------------------------------------------------------
                                  |   High blood pressure 
                                  |          0           1
    ------------------------------+-----------------------
    Race                          |                       
      White                       |                       
        Factor-variable frequency |      5,317       3,748
        Factor-variable percent   |      58.65       41.35
      Black                       |                       
        Factor-variable frequency |        545         541
        Factor-variable percent   |      50.18       49.82
      Other                       |                       
        Factor-variable frequency |        113          87
        Factor-variable percent   |      56.50       43.50
    ------------------------------------------------------
    
    .

    Comment


    • #3
      Hi Andrew, apologies for the delay and thank you so much for the support. How would this work with multiple categorical and continuous variables? For example, if I also wanted the table to include the variables "race", "bmi", "rural", and "age" altogether as rows in the column?

      As well as including the P-Value calculation as the code in the first post?

      Comment


      • #4
        Age and bmi should be considered continuous variables. How they should be tabulated against levels of high blood pressure is not clear. For categorical variables, you can combine tables as follows:

        Code:
        webuse nhanes2l, clear
        collect clear
        table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(race) nototal
        table (var) (rural), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(rural) nototal
        collect combine all= race rural
        collect label levels result fvfrequency "Freq.", modify
        collect label levels result fvpercent "Pct.", modify
        collect layout (race rural) (var#result)
        Res.:

        Code:
        . collect layout (race rural) (var#result)
        
        Collection: all
              Rows: race rural
           Columns: var#result
           Table 1: 7 x 4
        
        ----------------------------------------
                |       High blood pressure    
                |        0               1      
                |  Freq.    Pct.   Freq.    Pct.
        --------+-------------------------------
        Race    |                              
          White |  5,317   58.65   3,748   41.35
          Black |    545   50.18     541   49.82
          Other |    113   56.50      87   43.50
        Rural   |                              
          Urban |  3,798   58.00   2,750   42.00
          Rural |  2,177   57.24   1,626   42.76
        ----------------------------------------

        Comment


        • #5
          Hi Andrew,

          I would like to have mean and SD for the continuous variables, as in the example in opening post. Afterwards, I will also add P-Values, to tell if the variables are different across levels of highbp.

          I've spent some time on this, but cannot achieve the similar table as the opening post. I am not sure if the following is the correct approach?

          Code:
          webuse nhanes2l, clear
          collect clear
          table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(race) nototal
          table (var) (rural), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(rural) nototal
          table (highbp) (var), statistic(mean bmi) statistic(sd bmi) name(bmi) nototal // There must be a different approach?
          table (highbp) (var), statistic(mean age) statistic(sd age)  name(age) nototal // There must be a different approach?
          collect combine all= race rural bmi age
          collect label levels result fvfrequency "Freq.", modify
          collect label levels result fvpercent "Pct.", modify
          collect label levels result mean "Mean", modify
          collect label levels result sd "SD", modify
          collect layout (race rural highbp) (var#result)
          Which results in:

          Code:
          Collection: all
                Rows: race rural highbp
             Columns: var#result
             Table 1: 10 x 8
          
          ----------------------------------------------------------------------------------------------------
                              |      Age (years)        Body mass index (BMI)         High blood pressure    
                              |      Mean         SD         Mean           SD         0               1      
                              |                                                  Freq.    Pct.   Freq.    Pct.
          --------------------+-------------------------------------------------------------------------------
          Race                |                                                                              
            White             |                                                  5,317   58.65   3,748   41.35
            Black             |                                                    545   50.18     541   49.82
            Other             |                                                    113   56.50      87   43.50
          Rural               |                                                                              
            Urban             |                                                  3,798   58.00   2,750   42.00
            Rural             |                                                  2,177   57.24   1,626   42.76
          High blood pressure |                                                                              
            0                 |  42.16502   16.77157     24.20231     4.100279                                
            1                 |  54.97281   14.90897     27.36081     5.332119                                
          ----------------------------------------------------------------------------------------------------
          Thank you as always for your continued support!
          Last edited by Nathan Yu; 22 Aug 2024, 21:07.

          Comment


          • #6
            Code:
            help collect get

            Code:
            webuse nhanes2l, clear
            collect clear
            table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(race) nototal
            table (var) (rural), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(rural) nototal
            
            foreach var in age bmi{
                qui sum `var' if !highbp
                collect get fvfrequency = "`:display %9.3f r(mean)'", tags(`var'[] var[0.highbp])
                collect get fvpercent = "`:display %9.3f r(sd)'", tags(`var'[] var[0.highbp])
                qui sum `var' if highbp
                collect get fvfrequency = "`:display %9.3f r(mean)'", tags(`var'[] var[1.highbp])
                collect get fvpercent = "`:display %9.3f r(sd)'", tags(`var'[] var[1.highbp])
            }
            collect combine all= race rural
            collect label levels result fvfrequency "Count / Mean", modify
            collect label levels result fvpercent "% / (SD)", modify
            collect layout (age bmi race rural) (var#result)
            Res.:

            Code:
            . collect layout (age bmi race rural) (var#result)
            
            Collection: all
                  Rows: age bmi race rural
               Columns: var#result
               Table 1: 9 x 4
            
            --------------------------------------------------------------------------
                                  |                 High blood pressure              
                                  |             0                         1          
                                  |  Count / Mean   % / (SD)   Count / Mean   % / (SD)
            ----------------------+---------------------------------------------------
            Age (years)           |        42.165     16.772         54.973     14.909
            Body mass index (BMI) |        24.202      4.100         27.361      5.332
            Race                  |                                                  
              White               |         5,317      58.65          3,748      41.35
              Black               |           545      50.18            541      49.82
              Other               |           113      56.50             87      43.50
            Rural                 |                                                  
              Urban               |         3,798      58.00          2,750      42.00
              Rural               |         2,177      57.24          1,626      42.76
            --------------------------------------------------------------------------
            Last edited by Andrew Musau; 23 Aug 2024, 09:37.

            Comment


            • #7
              Thank you so much Andrew for the continued support! Got it, I will individually tag the scalars then and call them with the layout command!

              Comment


              • #8
                Hi Andrew, this is a very clever method, thank you for teaching it to me. However, I wanted to put the SD's in the table in a parenthesis. How can I go about doing so?

                Normally I would use the following code:

                Code:
                collect style cell ///
                                var[age bmi]#result[fvpercent], ///
                                sformat("(%s)")
                However, I get the following error:

                Code:
                . collect style cell ///
                >                 var[age bmi]#result[fvpercent], ///
                >                 sformat("(%s)")
                (level age of dimension var not found)
                (level bmi of dimension var not found)
                How can I reformat the cells given the method you suggested?

                Apologies if it sounds ambiguous, but I hope to change both the nformat() and sformat() of these collected values after they have already been stored under the result[fvpercent] dimension/level.

                Comment


                • #9
                  Code:
                  collect style cell result[fvpercent] , sformat("(%s)")
                  Res.:

                  Code:
                  . collect preview
                  
                  --------------------------------------------------------------------------
                                        |                 High blood pressure               
                                        |             0                         1           
                                        |  Count / Mean   % / (SD)   Count / Mean   % / (SD)
                  ----------------------+---------------------------------------------------
                  Age (years)           |        42.165   (16.772)         54.973   (14.909)
                  Body mass index (BMI) |        24.202    (4.100)         27.361    (5.332)
                  Race                  |                                                   
                    White               |         5,317    (58.65)          3,748    (41.35)
                    Black               |           545    (50.18)            541    (49.82)
                    Other               |           113    (56.50)             87    (43.50)
                  Rural                 |                                                   
                    Urban               |         3,798    (58.00)          2,750    (42.00)
                    Rural               |         2,177    (57.24)          1,626    (42.76)
                  --------------------------------------------------------------------------
                  
                  .

                  Comment


                  • #10
                    You are actually mixing up standard deviations and percentages in one column, so #9 is not a solution to your problem. Just insert the parentheses in the collect get command lines.

                    Code:
                    webuse nhanes2l, clear
                    collect clear
                    table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(race) nototal
                    table (var) (rural), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(rural) nototal
                    
                    foreach var in age bmi{
                        qui sum `var' if !highbp
                        collect get fvfrequency = "`:display %4.3f r(mean)'", tags(`var'[] var[0.highbp])
                        collect get fvpercent = "(`:display %4.3f r(sd)')", tags(`var'[] var[0.highbp])
                        qui sum `var' if highbp
                        collect get fvfrequency = "`:display %4.3f r(mean)'", tags(`var'[] var[1.highbp])
                        collect get fvpercent = "(`:display %4.3f r(sd)')", tags(`var'[] var[1.highbp])
                    }
                    collect combine all= race rural
                    collect label levels result fvfrequency "Count / Mean", modify
                    collect label levels result fvpercent "% / (SD)", modify
                    collect layout (age bmi race rural) (var#result)
                    Res.:

                    Code:
                    . collect layout (age bmi race rural) (var#result)
                    
                    Collection: all
                          Rows: age bmi race rural
                       Columns: var#result
                       Table 1: 9 x 4
                    
                    --------------------------------------------------------------------------
                                          |                 High blood pressure               
                                          |             0                         1           
                                          |  Count / Mean   % / (SD)   Count / Mean   % / (SD)
                    ----------------------+---------------------------------------------------
                    Age (years)           |        42.165   (16.772)         54.973   (14.909)
                    Body mass index (BMI) |        24.202    (4.100)         27.361    (5.332)
                    Race                  |                                                   
                      White               |         5,317      58.65          3,748      41.35
                      Black               |           545      50.18            541      49.82
                      Other               |           113      56.50             87      43.50
                    Rural                 |                                                   
                      Urban               |         3,798      58.00          2,750      42.00
                      Rural               |         2,177      57.24          1,626      42.76
                    --------------------------------------------------------------------------

                    Comment


                    • #11
                      Hi Andrew,

                      Thank you so much for this -- excellent! I had originally had trouble attempting exactly this, however from your code I realized that that I needed to change the number-width of the format (from %9.3f to %4.3f) that is being applied to the numbers being collected to make sure spacing is correctly aligned:

                      Code:
                      collect get fvpercent = "(`:display %9.3f r(sd)')", tags(`var'[] var[0.highbp])
                      vs the following:

                      Code:
                      collect get fvpercent = "(`:display %4.3f r(sd)')", tags(`var'[] var[0.highbp])

                      From curiosity, would the display command in 'collect get fvpercent = "(`:display %4.3f r(sd)')", tags(`var'[] var[0.highbp])' be able to also take other additional formatting options, such as below sformat option below that adds a percentage sign to the collected number?

                      Code:
                      collect style cell result[fvpercent], sformat("%s%%")
                      Thank you as always, and please do not spend time on this if no obvious method comes to your mind!
                      Last edited by Nathan Yu; 24 Sep 2024, 18:23.

                      Comment


                      • #12
                        Same way you inserted the parentheses, you can insert other characters.

                        Code:
                        webuse nhanes2l, clear
                        collect clear
                        table (var) (race), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(race) nototal
                        table (var) (rural), statistic(fvfrequency highbp) statistic(fvpercent highbp) name(rural) nototal
                        
                        foreach var in age bmi{
                            qui sum `var' if !highbp
                            collect get fvfrequency = "`:display %4.3f r(mean)'", tags(`var'[] var[0.highbp])
                            collect get fvpercent = "(`:display %4.3f r(sd)'%)", tags(`var'[] var[0.highbp])
                            qui sum `var' if highbp
                            collect get fvfrequency = "`:display %4.3f r(mean)'", tags(`var'[] var[1.highbp])
                            collect get fvpercent = "(`:display %4.3f r(sd)'%)", tags(`var'[] var[1.highbp])
                        }
                        collect combine all= race rural
                        collect label levels result fvfrequency "Count / Mean", modify
                        collect label levels result fvpercent "% / (SD)", modify
                        collect layout (age bmi race rural) (var#result)
                        Res.:

                        Code:
                        . collect layout (age bmi race rural) (var#result)
                        
                        Collection: all
                              Rows: age bmi race rural
                           Columns: var#result
                           Table 1: 9 x 4
                        
                        ----------------------------------------------------------------------------
                                              |                  High blood pressure                
                                              |              0                          1           
                                              |  Count / Mean    % / (SD)   Count / Mean    % / (SD)
                        ----------------------+-----------------------------------------------------
                        Age (years)           |        42.165   (16.772%)         54.973   (14.909%)
                        Body mass index (BMI) |        24.202    (4.100%)         27.361    (5.332%)
                        Race                  |                                                     
                          White               |         5,317       58.65          3,748       41.35
                          Black               |           545       50.18            541       49.82
                          Other               |           113       56.50             87       43.50
                        Rural                 |                                                     
                          Urban               |         3,798       58.00          2,750       42.00
                          Rural               |         2,177       57.24          1,626       42.76
                        ----------------------------------------------------------------------------

                        Comment


                        • #13
                          Hello Andrew, my apologies for the late reply. This is excellent, thank you very much for this!

                          Comment

                          Working...
                          X