Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a two-way table for categorical variables

    I generated 4 categorical variables with 6 categories each (1-6). The 6 categories are similar across variables but the number of observations differ between them. I want to create a table that summarizes the counts of all 4 variables (rows) with their 6 categories (columns) and also shows zero if no observations.
    I used the table command but it shows no observations when I select 2 or more variables to tabulate.

  • #2
    Something like this? https://www.statalist.org/forums/for...across-columns

    Comment


    • #3
      Thank you so much, Andrew! this is helpful. I tried the code using 2 variables and it worked. However, one of the "Timepoints" is missing and the order of the labels is not in order. See my code below.
      One more question: When adding the remaining 2 variables, should I add the labels (lab3 & lab4) for the 6 categories?

      label values Ultrapos_Cultpos lab1
      label def lab1 1 "Baseline", modify
      label def lab1 2 "M3", modify
      label def lab1 3 "M6", modify
      label def lab1 4 "M3-9", modify
      label def lab1 5 "M12", modify
      label def lab1 6 "M12+", modify
      label values Ultrapos_Cultneg lab2
      label def lab2 1 "Baseline", modify
      label def lab2 2 "M3", modify
      label def lab2 3 "M6", modify
      label def lab2 4 "M3-9", modify
      label def lab2 5 "M12", modify
      label def lab2 6 "M12+", modify
      frame put *, into(table)
      frame table{
      decode Ultrapos_Cultpos, gen(TimepointsUltrapos_Cultpos)
      decode Ultrapos_Cultneg, gen(TimepointsUltrapos_Cultneg)
      keep Timepoints*
      gen long obsno=_n
      reshape long Timepoints, i(obsno) string
      lab define Timepoints 1 "Baseline" 2 "M3" 3 "M6" 4 "M3-9" 5 "M12" 6 "M12+"
      lab define testingmethod 1 "Ultrapos_Cultpos" 2 "Ultrapos_Cultneg"
      encode _j, gen(testingmethod) label(testingmethod)
      *encode Timepoints, gen(Timepoints) label(Timepoints)
      dtable i.testingmethod, by(Timepoints)
      }
      frame drop table

      See the output table below


      Timepoints
      Baseline M12 M12+ M3-9 M6 Total

      N 16 (28.1%) 10 (17.5%) 15 (26.3%) 7 (12.3%) 9 (15.8%) 57 (100.0%)
      testingmethod
      Ultrapos_Cultpos 0 (0.0%) 0 (0.0%) 10 (66.7%) 0 (0.0%) 0 (0.0%) 10 (17.5%)
      Ultrapos_Cultneg 16 (100.0%) 10 (100.0%) 5 (33.3%) 7 (100.0%) 9 (100.0%) 47 (82.5%)

      Comment


      • #4
        I don't have an example to test, but I would unify the values to begin with (before reshape). For example, if the category "X" takes the values 1 in variable "A" and takes the values 3 in variable "B", then recode the values to 3 in "A" or to 1 in "B". Then after reshaping, you can define one unified set of labels. If this is not useful, present a data example which replicates your problem.

        Comment


        • #5
          See also https://www.statalist.org/forums/for...updated-on-ssc

          As the thread reveals, groups is available through the Stata Journal.

          Comment


          • #6
            Thank you Nick Cox. I am using Stata 18, which says "command groups is unrecognized".

            Comment


            • #7
              As already mentioned, groups is available from the Stata Journal. It is not an official command, so you must install it.

              As its author, I was slightly mischievous in using as a name a common English word that might evoke several hits on any search. StataCorp have on occasion laid claim to all words in the English language as possible command names.

              So, a direct if not quite predictable way to install it is through typing this search command:

              Code:
              . search st0496, entry
              
              Search of official help files, FAQs, Examples, and Stata Journals
              
              SJ-18-1 st0496_1  . . . . . . . . . . . . . . . . . Software update for groups
                      (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                      Q1/18   SJ 18(1):291
                      groups exited with an error message if weights were specified;
                      this has been corrected
              
              SJ-17-3 st0496  . . . . .  Speaking Stata: Tables as lists: The groups command
                      (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                      Q3/17   SJ 17(3):760--773
                      presents command for listing group frequencies and percents and
                      cumulations thereof; for various subsetting and ordering by
                      frequencies, percents, and so on; for reordering of columns;
                      and for saving tabulated data to new datasets
              Hence at the time of writing the latest public version can be downloaded by first running that command and then clicking on st0496_1 and so on.

              Comment


              • #8
                PID Baseline_smear Baseline_Xprt Baseline_culture M3_smear M3_Xprt M3_culture M6_smear M6_Xprt M6_culture M3-9_smear M3-9_Xprt M3-9_culture
                1 0 1 1 0 0 0 0 0 0 0 0 0
                2 0 1 0 0 0 0 0 0 0 1 0 0
                3 0 0 0 1 0 0 0 0 0 0 0 0
                4 1 1 1 0 0 0 0 0 0 0 0 0
                5 0 0 0 0 1 1 0 0 0 0 0 0
                6 1 0 0 0 0 0 0 0 0 0 0 0
                7 1 1 0 0 0 0 0 0 0 0 0 0
                8 0 1 1 0 0 0 0 0 0 0 0 0
                9 0 1 0 0 0 0 0 0 0 0 0 0
                10 0 0 0 1 1 0 0 0 0 0 1 0
                11 0 0 0 0 0 0 0 1 1 0 0 0
                12 0 0 0 0 1 1 0 0 0 0 0 0
                13 0 1 0 0 0 0 0 0 0 0 0 0
                14 0 1 0 0 0 0 0 0 0 0 0 0
                15 0 1 0 0 0 0 0 0 0 0 0 0
                16 0 1 0 0 0 0 0 0 0 0 0 0
                17 1 1 1 0 0 0 0 0 0 0 0 0
                18 0 0 0 0 0 0 0 1 0 1 0 0
                19 0 0 0 0 0 0 0 0 0 0 1 1
                20 1 1 1 0 0 0 0 0 0 0 0 0
                @Andrew Musau Nick Cox I am not a statistician nor an expert in state, so I'd appreciate it if both of you, using the example above, show me how to get the table using your respective approaches. Below is an explanation of the example:
                We are testing TB patients with 3 methods smear, Xpert, and culture at 6 time points (categories) Baseline, M3, M6, and M3-9 (just gave 4 in the example for simplicity). Want to get the frequency of different testing method combinations? Xpert positive_culture positive (v1), Xpert positive_culture negative (v2), and smear positive only (v3). The time points are similar for all patients.

                Nick Cox, I managed to install the groups command, thanks.

                Comment


                • #9
                  The example data -- whether alluded to or exemplified -- seems to keep changing. Why not try the command yourself and see if it helps?

                  Comment


                  • #10
                    As Nick notes, your data description is not consistent with #1. Nevertheless, maybe the following helps.

                    Code:
                    * Example generated by -dataex-. For more info, type help dataex
                    clear
                    input byte(pid baseline_smear baseline_xprt baseline_culture m3_smear m3_xprt m3_culture m6_smear m6_xprt m6_culture m39_smear m39_xprt m39_culture)
                     1 0 1 1 0 0 0 0 0 0 0 0 0
                     2 0 1 0 0 0 0 0 0 0 1 0 0
                     3 0 0 0 1 0 0 0 0 0 0 0 0
                     4 1 1 1 0 0 0 0 0 0 0 0 0
                     5 0 0 0 0 1 1 0 0 0 0 0 0
                     6 1 0 0 0 0 0 0 0 0 0 0 0
                     7 1 1 0 0 0 0 0 0 0 0 0 0
                     8 0 1 1 0 0 0 0 0 0 0 0 0
                     9 0 1 0 0 0 0 0 0 0 0 0 0
                    10 0 0 0 1 1 0 0 0 0 0 1 0
                    11 0 0 0 0 0 0 0 1 1 0 0 0
                    12 0 0 0 0 1 1 0 0 0 0 0 0
                    13 0 1 0 0 0 0 0 0 0 0 0 0
                    14 0 1 0 0 0 0 0 0 0 0 0 0
                    15 0 1 0 0 0 0 0 0 0 0 0 0
                    16 0 1 0 0 0 0 0 0 0 0 0 0
                    17 1 1 1 0 0 0 0 0 0 0 0 0
                    18 0 0 0 0 0 0 0 1 0 1 0 0
                    19 0 0 0 0 0 0 0 0 0 0 1 1
                    20 1 1 1 0 0 0 0 0 0 0 0 0
                    end
                    
                    reshape long baseline m3 m6 m39, i(pid) j(Category) string
                    rename (baseline m3 m6 m39) var=
                    reshape long var, i(pid Category) j(which) string
                    keep if var
                    replace Category= proper(substr(Category, 2, .))
                    encode which, g(Condition)
                    dtable i.Condition, by(Category)
                    You can work on assigning meaningful labels to your categories.

                    Res.:

                    Code:
                     
                    . dtable i.Condition, by(Category)
                    
                    -----------------------------------------------------
                                                Category                 
                                Culture    Smear      Xprt       Total   
                    -----------------------------------------------------
                    N          9 (24.3%) 9 (24.3%) 19 (51.4%) 37 (100.0%)
                    Condition                                            
                      baseline 5 (55.6%) 5 (55.6%) 12 (63.2%)  22 (59.5%)
                      m3       2 (22.2%) 2 (22.2%)  3 (15.8%)   7 (18.9%)
                      m39      1 (11.1%) 2 (22.2%)  2 (10.5%)   5 (13.5%)
                      m6       1 (11.1%)  0 (0.0%)  2 (10.5%)    3 (8.1%)
                    -----------------------------------------------------

                    Comment

                    Working...
                    X