Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating percentages in categorical data

    I have MORG data where I am interested to see the percentage of Blacks across regions in the US (I am using 1979-2004 cross sections that I merged together into one dataset). The categorical variable (byte) for region is called 'region' and contains 9 US regions (numbered 1 to 9). I also have a variable for race, containing 21 different races. I am only interested in the percentage of Blacks in the sample, race == 2.

    I tried using the command:

    ​​​​​​tab region race, row

    but have too many values. Do I need to construct some kind of formula to obtain the percentages? What command can I use?

  • #2
    If you have only 9 regions and 21 races, that is well within the capability of -tab-. You should not be getting that error message. Please use -dataex- to show example data that exhibits your difficulty along with the exact code that leads to this error.

    That said, you can also get the proportions of blacks (which you can then multiply by 100 to get percentages) with
    Code:
    by region, sort: sum 2.race

    Comment


    • #3
      Maybe the problem is that I constructed the 9 regions from all the US states? (ie recoded and generated this new variable?)

      Comment


      • #4
        No, that should not cause a problem, unless you made a mistake. Again, as Clyde suggested, you should show some sample data using the dataex command. At a minimum
        Code:
        tab region
        tab race
        and copy the results and paste them into a code block in your next post.

        To assure maximum readability of results that you post, please copy them from the Results window or your log file into a code block in the Forum editor using code delimiters [CODE] and [/CODE], as explained in section 12 of the Statalist FAQ linked to at the top of the page. For example, the following:

        [CODE]
        . sysuse auto, clear
        (1978 Automobile Data)

        . describe make price

        storage display value
        variable name type format label variable label
        -----------------------------------------------------------------
        make str18 %-18s Make and Model
        price int %8.0gc Price
        [/CODE]

        will be presented in the post as the following:
        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . describe make price
        
                      storage   display    value
        variable name   type    format     label      variable label
        -----------------------------------------------------------------
        make            str18   %-18s                 Make and Model
        price           int     %8.0gc                Price

        Comment


        • #5
          Here is what I get when I do tab

          Code:
           tab region
          
             RECODE of state |
                     (State) |      Freq.     Percent        Cum.
          -------------------+-----------------------------------
                 New England |  1,165,378        9.07        9.07
             Middle Atlantic |  1,657,581       12.90       21.97
          East North Central |  1,802,847       14.03       36.00
          West North Central |  1,326,760       10.33       46.33
              South Atlantic |  2,129,628       16.58       62.91
          East South Central |    683,432        5.32       68.23
          West South Central |  1,083,562        8.43       76.66
                    Mountain |  1,307,749       10.18       86.84
                     Pacific |  1,690,879       13.16      100.00
          -------------------+-----------------------------------
                       Total | 12,847,816      100.00
          
          . tab race
          
                 Race |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                White | 11,684,436       85.19       85.19
                Black |  1,332,280        9.71       94.90
                Other |    249,495        1.82       96.72
                    4 |    345,650        2.52       99.24
                    5 |     30,973        0.23       99.47
                    6 |     11,678        0.09       99.55
                    7 |     32,814        0.24       99.79
                    8 |      9,829        0.07       99.87
                    9 |      3,530        0.03       99.89
                   10 |      3,682        0.03       99.92
                   11 |        824        0.01       99.92
                   12 |        229        0.00       99.93
                   13 |        200        0.00       99.93
                   14 |      2,501        0.02       99.95
                   15 |      2,279        0.02       99.96
                   16 |        906        0.01       99.97
                   17 |        311        0.00       99.97
                   18 |      2,099        0.02       99.99
                   19 |        128        0.00       99.99
                   20 |        602        0.00       99.99
                   21 |      1,013        0.01      100.00
                   22 |         19        0.00      100.00
                   23 |         52        0.00      100.00
                   24 |         14        0.00      100.00
                   25 |         21        0.00      100.00
                   26 |         71        0.00      100.00
          ------------+-----------------------------------
                Total | 13,715,636      100.00
          ​​​​​​

          Comment


          • #6
            So you have there a 26 x 9 or 9 x 26 table depending on taste and convenience. That's more than advertised, but nevertheless this test implies no problem either way, except of accessibility.

            Code:
            clear 
            set obs 26 
            set seed 2803 
            gen race = word("`c(alpha)'", _n) 
            gen region = runiformint(1, 9)
            tab region race
            tab race region
            Neverthless, something like

            Code:
            egen pcblack = mean(100 * (race == 2)), by(region) 
            tabdisp region, c(pcblack)
            should also work.

            Comment


            • #7
              At the moment, I cannot come up with a hypothesis for why your command fails. The following simulation of your work - a 10 by 26 tab, including missing values for almost a million observations of region - runs with no errors, and I don't think the lack of value labels should make a difference.
              Code:
              // generate pretend data
              clear
              set obs 12847816
              generate region = runiformint(1,10)
              set obs 13715636
              generate race = runiformint(1,26)
              // try tab
              tab region race, row missing
              Perhaps we are misunderstanding the problem reported in post #1. Can you run the following and post the tab commands and the output from your Results window, as you did in post #5?
              Code:
              tab region, missing nolabel
              tab race, missing nolabel
              tab region race, row missing nolabel

              Comment


              • #8
                For some reason it just keeps telling me there are too many values. I can do 'race region, row' but those percentages are not the ones I am looking for. I then tried the command suggested by Nick Cox, and that seems to work, but I don't quite understand why the percentages are slightly different when I use the command 'tab race if region == 1" and so forth for other regions.

                Code:
                ​​​​​​ tab region, missing nolabel
                
                RECODE of |
                state |
                (State) | Freq. Percent Cum.
                ------------+-----------------------------------
                1 | 1,165,378 8.50 8.50
                2 | 1,657,581 12.09 20.58
                3 | 1,802,847 13.14 33.73
                4 | 1,326,760 9.67 43.40
                5 | 2,129,628 15.53 58.93
                6 | 683,432 4.98 63.91
                7 | 1,083,562 7.90 71.81
                8 | 1,307,749 9.53 81.34
                9 | 1,690,879 12.33 93.67
                . | 867,820 6.33 100.00
                ------------+-----------------------------------
                Total | 13,715,636 100.00
                
                . tab race, missing nolabel
                
                Race | Freq. Percent Cum.
                ------------+-----------------------------------
                1 | 11,684,436 85.19 85.19
                2 | 1,332,280 9.71 94.90
                3 | 249,495 1.82 96.72
                4 | 345,650 2.52 99.24
                5 | 30,973 0.23 99.47
                6 | 11,678 0.09 99.55
                7 | 32,814 0.24 99.79
                8 | 9,829 0.07 99.87
                9 | 3,530 0.03 99.89
                10 | 3,682 0.03 99.92
                11 | 824 0.01 99.92
                12 | 229 0.00 99.93
                13 | 200 0.00 99.93
                14 | 2,501 0.02 99.95
                15 | 2,279 0.02 99.96
                16 | 906 0.01 99.97
                17 | 311 0.00 99.97
                18 | 2,099 0.02 99.99
                19 | 128 0.00 99.99
                20 | 602 0.00 99.99
                21 | 1,013 0.01 100.00
                22 | 19 0.00 100.00
                23 | 52 0.00 100.00
                24 | 14 0.00 100.00
                25 | 21 0.00 100.00
                26 | 71 0.00 100.00
                ------------+-----------------------------------
                Total | 13,715,636 100.00
                
                . tab region race, row missing nolabel
                too many values
                r(134);
                
                tab race region, row missing nolabel
                
                +----------------+
                | Key |
                |----------------|
                | frequency |
                | row percentage |
                +----------------+
                
                | RECODE of state (State)
                Race | 1 2 3 4 5 6 7 8 9 . | Total
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                1 | 1,103,451 1,406,528 1,583,676 1,239,762 1,647,275 548,473 906,253 1,215,996 1,329,508 703,514 |11,684,436
                | 9.44 12.04 13.55 10.61 14.10 4.69 7.76 10.41 11.38 6.02 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                2 | 34,836 187,308 179,112 41,601 427,884 127,696 136,334 26,603 79,444 91,462 | 1,332,280
                | 2.61 14.06 13.44 3.12 32.12 9.58 10.23 2.00 5.96 6.87 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                3 | 5,586 15,000 13,301 26,696 15,606 2,119 18,682 39,980 101,780 10,745 | 249,495
                | 2.24 6.01 5.33 10.70 6.26 0.85 7.49 16.02 40.79 4.31 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                4 | 15,663 42,041 20,722 12,524 29,941 3,181 14,478 16,565 146,054 44,481 | 345,650
                | 4.53 12.16 6.00 3.62 8.66 0.92 4.19 4.79 42.25 12.87 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                5 | 1,399 3,610 1,885 875 2,495 254 1,351 2,795 12,351 3,958 | 30,973
                | 4.52 11.66 6.09 2.83 8.06 0.82 4.36 9.02 39.88 12.78 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                6 | 962 1,105 1,109 896 1,600 235 611 771 1,392 2,997 | 11,678
                | 8.24 9.46 9.50 7.67 13.70 2.01 5.23 6.60 11.92 25.66 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                7 | 2,609 1,063 1,974 3,386 2,504 1,131 4,931 3,649 6,509 5,058 | 32,814
                | 7.95 3.24 6.02 10.32 7.63 3.45 15.03 11.12 19.84 15.41 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                8 | 413 321 413 435 832 104 257 711 4,306 2,037 | 9,829
                | 4.20 3.27 4.20 4.43 8.46 1.06 2.61 7.23 43.81 20.72 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                9 | 29 59 65 88 72 17 69 198 2,257 676 | 3,530
                | 0.82 1.67 1.84 2.49 2.04 0.48 1.95 5.61 63.94 19.15 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                10 | 203 266 304 233 727 156 401 176 438 778 | 3,682
                | 5.51 7.22 8.26 6.33 19.74 4.24 10.89 4.78 11.90 21.13 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                11 | 31 44 25 28 152 5 24 45 268 202 | 824
                | 3.76 5.34 3.03 3.40 18.45 0.61 2.91 5.46 32.52 24.51 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                12 | 6 6 11 4 25 0 8 13 97 59 | 229
                | 2.62 2.62 4.80 1.75 10.92 0.00 3.49 5.68 42.36 25.76 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                13 | 3 7 14 5 13 0 3 22 85 48 | 200
                | 1.50 3.50 7.00 2.50 6.50 0.00 1.50 11.00 42.50 24.00 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                14 | 20 11 17 19 25 9 12 44 2,328 16 | 2,501
                | 0.80 0.44 0.68 0.76 1.00 0.36 0.48 1.76 93.08 0.64 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                15 | 107 160 140 113 294 35 100 77 663 590 | 2,279
                | 4.70 7.02 6.14 4.96 12.90 1.54 4.39 3.38 29.09 25.89 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                16 | 31 37 31 28 91 6 23 40 109 510 | 906
                | 3.42 4.08 3.42 3.09 10.04 0.66 2.54 4.42 12.03 56.29 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                17 | 3 2 13 28 25 3 6 18 170 43 | 311
                | 0.96 0.64 4.18 9.00 8.04 0.96 1.93 5.79 54.66 13.83 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                18 | 8 1 3 8 8 1 3 15 2,041 11 | 2,099
                | 0.38 0.05 0.14 0.38 0.38 0.05 0.14 0.71 97.24 0.52 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                19 | 4 0 11 7 12 4 1 5 38 46 | 128
                | 3.12 0.00 8.59 5.47 9.38 3.12 0.78 3.91 29.69 35.94 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                20 | 5 9 15 10 25 2 11 15 489 21 | 602
                | 0.83 1.50 2.49 1.66 4.15 0.33 1.83 2.49 81.23 3.49 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                21 | 9 1 6 8 13 1 2 8 508 457 | 1,013
                | 0.89 0.10 0.59 0.79 1.28 0.10 0.20 0.79 50.15 45.11 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                22 | 0 2 0 1 0 0 0 0 4 12 | 19
                | 0.00 10.53 0.00 5.26 0.00 0.00 0.00 0.00 21.05 63.16 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                23 | 0 0 0 5 3 0 0 3 9 32 | 52
                | 0.00 0.00 0.00 9.62 5.77 0.00 0.00 5.77 17.31 61.54 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                24 | 0 0 0 0 1 0 0 0 2 11 | 14
                | 0.00 0.00 0.00 0.00 7.14 0.00 0.00 0.00 14.29 78.57 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                25 | 0 0 0 0 1 0 0 0 6 14 | 21
                | 0.00 0.00 0.00 0.00 4.76 0.00 0.00 0.00 28.57 66.67 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                26 | 0 0 0 0 4 0 2 0 23 42 | 71
                | 0.00 0.00 0.00 0.00 5.63 0.00 2.82 0.00 32.39 59.15 | 100.00
                -----------+--------------------------------------------------------------------------------------------------------------+----------
                Total | 1,165,378 1,657,581 1,802,847 1,326,760 2,129,628 683,432 1,083,562 1,307,749 1,690,879 867,820 |13,715,636
                | 8.50 12.09 13.14 9.67 15.53 4.98 7.90 9.53 12.33 6.33 | 100.00
                
                egen pcblack = mean(100 * (race == 2)), by(region)
                
                *I want to see up to 2004
                
                preserve
                
                drop if year > 2004
                
                tabdisp region, c(pcblack)
                
                -------------------------------
                RECODE of state |
                (State) | pcblack
                -------------------+-----------
                New England | 2.989245
                Middle Atlantic | 11.30008
                East North Central | 9.934953
                West North Central | 3.135533
                South Atlantic | 20.09196
                East South Central | 18.68452
                West South Central | 12.58202
                Mountain | 2.034259
                Pacific | 4.698385
                -------------------------------
                
                . tab race if region == 1
                
                Race | Freq. Percent Cum.
                ------------+-----------------------------------
                White | 803,480 95.60 95.60
                Black | 23,079 2.75 98.35
                Other | 4,625 0.55 98.90
                4 | 7,326 0.87 99.77
                5 | 1,142 0.14 99.91
                6 | 117 0.01 99.92
                7 | 531 0.06 99.98
                8 | 71 0.01 99.99
                9 | 3 0.00 99.99
                10 | 37 0.00 100.00
                11 | 1 0.00 100.00
                12 | 2 0.00 100.00
                15 | 19 0.00 100.00
                16 | 2 0.00 100.00
                18 | 1 0.00 100.00
                20 | 1 0.00 100.00
                21 | 3 0.00 100.00
                ------------+-----------------------------------
                Total | 840,440 100.00
                
                restore
                
                ​​​​​​
                Last edited by Thea Black; 20 Mar 2020, 11:38.

                Comment


                • #9
                  The output of the command help rc 134 tells us
                  Code:
                  [P]     error . . . . . . . . . . . . . . . . . . . . . . . .  Return code 134
                          too many values
                          1) You attempted to encode a string variable that takes on
                          more than 65,536 unique values.  2) You attempted to tabulate
                          a variable or pair of variables that take on too many values.
                          If you specified two variables, try interchanging them.
                          3) You issued a graph command using the by option.  The
                          by-variable takes on too many different values to construct
                          a readable chart.
                  Try interchanging the rows and columns? Well, the output of the command help tabulate twoway tells us
                  Code:
                  Limits
                  
                      Two-way tables may have a maximum of 1,200 rows and 80 columns (Stata/MP and Stata/SE)
                      or 300 rows and 20 columns (Stata/IC).  If larger tables are needed, see [R] table.
                  So I'm guessing you're using Stata/IC, which is why it works with 10 columns but not with 26.

                  Added in edit.
                  I can do 'race region, row' but those percentages are not the ones I am looking for.
                  Of course not, for that you want
                  Code:
                  tab race region, column missing
                  Last edited by William Lisowski; 20 Mar 2020, 12:28.

                  Comment

                  Working...
                  X