Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine Frequencies of Categorical Variables

    Hi everyone! I have 3 variables denoted below. The end goal for me is to have one variable that combines the results as such:
    Less than 10000 (numeric value 1) = 240 (25+114+101)
    10000-19999 (numeric value 2) = 140 (6+82+52)
    etc.

    I tried using the stack function but that ended up removing all the other variables. Does anyone have any ideas on how to do this? Thank you in advance!

    input double(A19_1 A19_2 A19_3)
    6 . 8
    2 1 1
    1 1 2
    4 4 4
    1 10 5
    . . .
    1 3 1
    1 1 1
    10 9 .
    . . .
    . 7 .
    . 3 9
    . 2 10
    1 1 1
    1 . .
    . 5 .
    . . 7
    . 2 2
    1 1 6
    1 1 4
    . . .
    . . 6
    . 3 3
    2 3 2
    . . 4
    . . .
    . 1 .
    . . .
    . . .
    . . .
    8 7 .
    . . 6
    . . .
    . . 7
    2 . 1
    . . 10
    . . .
    6 . 2
    . 5 4
    . 2 7
    . 1 .
    . . .
    . 4 .
    . 1 .
    . . 6
    . 4 3
    . . .
    . . .
    . . 2
    . . 3
    . . .
    . . 6
    . . 2
    . . .
    1 2 3
    . . 2
    . . 3
    . . .
    . . .
    . . .
    . 2 .
    . 4 3
    . . 2
    . . 4
    . . .
    . 1 2
    . . 5
    . 3 .
    . . 3
    . 3 2
    . 2 .
    . 2 .
    . . .
    . . 10
    . 5 10
    . 3 7
    . . 3
    . 1 7
    . . 1
    . 3 6
    . 3 3
    . . .
    3 2 3
    . 1 2
    . 2 2
    . 3 4
    . 1 1
    . . 3
    . 2 7
    2 3 9
    . 3 4
    . 1 2
    . 4 10
    . 3 3
    . 2 2
    . 10 10
    . 3 .
    . 7 4
    . 2 .
    . 2 4
    end
    label values A19_1 A19_1
    label def A19_1 1 "less than 10,000", modify
    label def A19_1 2 "10,000 - 19,999", modify
    label def A19_1 3 "20,000 - 29,999", modify
    label def A19_1 4 "30,000 - 39,999", modify
    label def A19_1 6 "50,000 – 59,999", modify
    label def A19_1 8 "70,000 – 79,999", modify
    label def A19_1 10 "90,000 or more", modify
    label values A19_2 A19_2
    label def A19_2 1 "less than 10,000", modify
    label def A19_2 2 "10,000 - 19,999", modify
    label def A19_2 3 "20,000 - 29,999", modify
    label def A19_2 4 "30,000 - 39,999", modify
    label def A19_2 5 "40,000 – 49,999", modify
    label def A19_2 7 "60,000 – 69,999", modify
    label def A19_2 9 "80,000 – 89,999", modify
    label def A19_2 10 "90,000 or more", modify
    label values A19_3 A19_3
    label def A19_3 1 "less than 10,000", modify
    label def A19_3 2 "10,000 - 19,999", modify
    label def A19_3 3 "20,000 - 29,999", modify
    label def A19_3 4 "30,000 - 39,999", modify
    label def A19_3 5 "40,000 – 49,999", modify
    label def A19_3 6 "50,000 – 59,999", modify
    label def A19_3 7 "60,000 – 69,999", modify
    label def A19_3 8 "70,000 – 79,999", modify
    label def A19_3 9 "80,000 – 89,999", modify
    label def A19_3 10 "90,000 or more", modify

  • #2
    You can apply the same value labels to one or more variables as is explained at

    Code:
    help label
    Otherwise I don't confidently understand what the question is here. For example, even when all three variables are non-missing, the categories don't unequivocally imply a category for the sum. For example 1 1 1 could still mean that the sum was under 10,000 and yet again that it was almost 30,000. So the correct answer for combining 1 1 1 could be any of 1, 2 and 3.

    stack is a command, not a function. You don't need doubles to hold integers up to 10 and missing.

    That said, you may just want a combined table and tabm from tab_chi on SSC will do that.


    Code:
    . tabm A19_*, transpose
    
                      |             variable
               values |     A19_1      A19_2      A19_3 |     Total
    ------------------+---------------------------------+----------
     less than 10,000 |         9         14          7 |        30 
      10,000 - 19,999 |         4         13         14 |        31 
      20,000 - 29,999 |         1         14         12 |        27 
      30,000 - 39,999 |         1          5          9 |        15 
                    5 |         0          3          2 |         5 
      50,000 – 59,999 |         2          0          6 |         8 
                    7 |         0          3          6 |         9 
      70,000 – 79,999 |         1          0          1 |         2 
                    9 |         0          1          2 |         3 
       90,000 or more |         1          2          6 |         9 
    ------------------+---------------------------------+----------
                Total |        19         55         65 |       139
    Some of the value labels need to be fixed and I have no idea where your frequencies 240 140 ... come from unless it's a larger dataset you aren't showing us.

    Comment


    • #3
      Hi Dr. Cox. Thank you for your reply! So each of the three variables represent income earned from a specific type of crop. I want to create one variable that represents income earned from any kind of crop (combines all observations from the three variables). When I use the "tab" function, I should get the cumulative frequency from combining all of the variables afterwards.

      In regards to your last question, I only provided about 100 observations. They are part of a larger dataset.

      Comment

      Working...
      X