Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating new variable based on percentages in crosstab

    Hello,

    I am struggeling with the following problem.

    I created a crosstabulation of two variables with percentages being shown.

    The code is simple:

    Code:
    tab land abmus, row
    Code:
     
    Bundesland 0 Zustimmung 1 Ablehnung 2 Unklar Total
    Schleswig-Holstein 35 / 42.17 36 / 43.37 12 / 14.46 83 / 100.00
    Hamburg 24 / 40.68 24 / 40.68 11 / 18.64 59 / 100.00
    Niedersachsen 105 / 37.50 129 / 46.07 46 /16.43 280 / 100.00
    ... ... ... ... ...
    What I want to do next is to create a new variable, which uses the percentages of the column "1 Ablehnung" as values.

    Thanks in advance!
    Last edited by Volker Kraft; 31 Mar 2022, 07:24.

  • #2
    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . egen pc_foreign = mean(100 * (foreign == 1)), by(rep78)
    
    . tab rep78 foreign, row
    
    +----------------+
    | Key            |
    |----------------|
    |   frequency    |
    | row percentage |
    +----------------+
    
        Repair |
        record |      Car origin
          1978 |  Domestic    Foreign |     Total
    -----------+----------------------+----------
             1 |         2          0 |         2 
               |    100.00       0.00 |    100.00 
    -----------+----------------------+----------
             2 |         8          0 |         8 
               |    100.00       0.00 |    100.00 
    -----------+----------------------+----------
             3 |        27          3 |        30 
               |     90.00      10.00 |    100.00 
    -----------+----------------------+----------
             4 |         9          9 |        18 
               |     50.00      50.00 |    100.00 
    -----------+----------------------+----------
             5 |         2          9 |        11 
               |     18.18      81.82 |    100.00 
    -----------+----------------------+----------
         Total |        48         21 |        69 
               |     69.57      30.43 |    100.00 
    
    .  tabdisp rep78, c(pc_foreign)
    
    ----------------------
    Repair    |
    record    |
    1978      | pc_foreign
    ----------+-----------
            1 |          0
            2 |          0
            3 |         10
            4 |         50
            5 |   81.81818
            . |         20
    ----------------------

    Comment


    • #3
      Thanks a lot Nick!

      Comment


      • #4
        I have an additional question to this topic and it is a little complicated.
        Now that I have a new variable with the percentages according to Nick's reply, I want to create a two way scatter with another variable.
        The problem is: the new variable is not part of my data yet.

        The percentages in my example display the frequency of anti-muslim-behaviour in a certain area. The new data contains information about what the percentage of muslims is in the population of the certain area, but, as said before, can only be found in a secondary source.

        My data contains about 3,500 observations but there are only 16 different states, so the new data has only 16 different values which need to be added.

        What would be the best way, to get this managed?

        I hope, this is explained clear enough for you to understand.

        Comment


        • #5
          It's a merge 1:m or merge m:1 depending on which way round you do it.

          Comment


          • #6
            Hello.
            I tried the same codes on my data but the percentages in the crosstabulation does not match the generated data. What is wrong with my code?

            Code:
            egen pc_cgtsmok = mean(100 * (cgtsmok == 1)), by(cntry)
            
            . sum pc_cgtsmok
            
                Variable |        Obs        Mean    Std. dev.       Min        Max
            -------------+---------------------------------------------------------
              pc_cgtsmok |     22,190    12.16314    4.514537    3.81451   19.76967
            
            . tab cntry cgtsmok, row
            
            +----------------+
            | Key            |
            |----------------|
            |   frequency    |
            | row percentage |
            +----------------+
            
                       |                    Cigarette smoking behaviour
               Country | I smoke d  I smoke d  I smoke b  I don’t s  I have on  I have ne |     Total
            -----------+------------------------------------------------------------------+----------
                    AT |       401         95         82        528        190      1,057 |     2,353 
                       |     17.04       4.04       3.48      22.44       8.07      44.92 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    CH |       147         85         78        277        178        617 |     1,382 
                       |     10.64       6.15       5.64      20.04      12.88      44.65 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    DE |       315        136        107        581        288        991 |     2,418 
                       |     13.03       5.62       4.43      24.03      11.91      40.98 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    FI |        95         63         72        473        292        567 |     1,562 
                       |      6.08       4.03       4.61      30.28      18.69      36.30 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    GB |       142         84         54        412        122        864 |     1,678 
                       |      8.46       5.01       3.22      24.55       7.27      51.49 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    HR |       309        102         48        243        100        756 |     1,558 
                       |     19.83       6.55       3.08      15.60       6.42      48.52 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    HU |       351        132         57        228        134      1,201 |     2,103 
                       |     16.69       6.28       2.71      10.84       6.37      57.11 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    IE |       217         79         49        441         99      1,127 |     2,012 
                       |     10.79       3.93       2.44      21.92       4.92      56.01 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    LT |       226        129         91        215        221        481 |     1,363 
                       |     16.58       9.46       6.68      15.77      16.21      35.29 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    NL |       110         80         64        538        206        696 |     1,694 
                       |      6.49       4.72       3.78      31.76      12.16      41.09 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    NO |        51         56         37        403        247        543 |     1,337 
                       |      3.81       4.19       2.77      30.14      18.47      40.61 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    SI |       153         71         60        238        123        601 |     1,246 
                       |     12.28       5.70       4.82      19.10       9.87      48.23 |    100.00 
            -----------+------------------------------------------------------------------+----------
                    SK |       182        130         93        261         93        678 |     1,437 
                       |     12.67       9.05       6.47      18.16       6.47      47.18 |    100.00 
            -----------+------------------------------------------------------------------+----------
                 Total |     2,699      1,242        892      4,838      2,293     10,179 |    22,143 
                       |     12.19       5.61       4.03      21.85      10.36      45.97 |    100.00

            Comment


            • #7
              Code:
              tabdisp cntry, c(pc_cgtsmok)
              
              ----------------------
                Country | pc_cgtsmok
              ----------+-----------
                     AT |   17.03483
                     CH |   10.62139
                     DE |   13.01653
                     FI |   6.078055
                     GB |   8.432304
                     HR |   19.76967
                     HU |   16.57224
                     IE |   10.75855
                     LT |   16.55678
                     NL |   6.489676
                     NO |    3.81451
                     SI |   12.25961
                     SK |   12.62136
              ----------------------

              Comment


              • #8
                I am not clear what is contradictory here, but

                Code:
                 mean(100 * (cgtsmok == 1)), by(cntry)
                will not ignore missing values on cgtsmok but the tabulate calculation will ignore missing values. The reason is that the expression

                Code:
                cgtsmok == 1
                evaluates to 1 if cgtsmok is 1 and to 0 for any other value of cgtsmok (including any missing values).

                You have 47 missing values (22190 - 22143).

                Code:
                mean(cond(missing(cgtsmok), ., 100 * (cgtsmok == 1))), by(cntry)
                would be one way to ignore the missings. (Code not tested.)
                Last edited by Nick Cox; 14 Oct 2024, 06:51.

                Comment

                Working...
                X