Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate percentage in stata

    The variable a13 is the answer for question" Are you currently in school?" 0 is for no,1 is for yes,9 is for unknown,.is missing, and rank from 1989~2011
    I want to calculate the school enrollment each year, which means I need the know the number of yes and the number of all the people answered this question, so I use count
    Code:
    count if (age>6)&(a13==1)&(wave==2011)
    1,535
    count if (a13==1)|(a13==0)|(a13==9)|(a13==.)&(wave==2011)
    50,069
    The I want to get the percentage,so I use gen:
    Code:
    gen per={count if (age>6)&(a13==1)&(wave==2011)}/{count if (a13==1)&(a13==0)&(a13==9)&(a13==.)&(wave==2011)}
    But, it doesn't work...
    I'm so confused why this expression isn't right, and what an easier way to calculate this percentage?

  • #2
    And another variable a11, is for education achievement,I recode it
    Code:
    recode a11  (11=1)(12=2)(13=3)(14=4)(15=5)(16=6)(21=7)(22=8)(23=9)///
    (24=10)(25=11)(26=12)(27 31=13)(28 32=14)(29 33=15)(34=16)(35=17)(36=18)
    The same question,I need the percentage of people whose a11>=23 and age>=16 each year

    Comment


    • #3
      You cannot use the count command as part of an expression. Try
      Code:
      assert !mi(age)
      quietly count if age > 6 & a13 == 1 & wave == 2011
      local numerator = r(N)
      quietly count if inlist(a13, 1, 0, 9, .) & wave == 2011
      generate double per = 100 * `numerator' / r(N)

      Comment


      • #4
        But, it doesn't work...
        I'm so confused why this expression isn't right, and what an easier way to calculate this percentage?
        It doesn't work because what you have on the right hand side of the equals sign is not a mathematical expression, it's a pair of commands, wrapped in parentheses and separated by a / character. And even if it did work, there is no reason to generate a new variable when all you want is a single number, not a value for each observation in the data set.

        If I wanted to build off of your approach, the first iteration would be:
        Code:
        count if (age>6)&(a13==1)&(wave==2011)
        local numerator = r(N)
        count if (a13==1)|(a13==0)|(a13==9)|(a13==.)&(wave==2011)
        local denominator = r(N)
        display "Percentage = " %3.2f  =100*`numerator'/`denominator'
        Now, the next iteration is to improve that. The second command is actually improperly stated because & binds before |. What you really mean is:
        Code:
        count if ((a13==1)|(a13==0)|(a13==9)|(a13==.)) & (wave==2011)
        But, it gets better than that, because the expression within parentheses is always true because 1, 0, 9, and 13 are the only possible values for a13. So that entire part of the expression does nothing. So it becomes
        Code:
        count if wave == 2011
        But we can do better. It is seldom the case in Stata that you need to separately calculate the numerator and denominator.

        So we can do this:
        Code:
        gen byte in_school = (a13 == 1) & (age > 6) & !missing(age)
        tab in_school if wave == 2011
        Finally, some advise about using 9 as a code for unknown. It will get you in trouble. At some point you'll forget that 9 is just a code, and you'll do some kind of calculation with that variable, and Stata will treat that 9 as actually being the number 9. So before you do anything else, you should replace those 9's by a missing value. If you want to maintain the distinction between unknown and missing, then use one of the extended missing values. So
        Code:
        replace a13 = .u if a13 == 9
        Now those "unknown" values will not be treated as if they were numerical 9's if you do some calculations with a13.

        For #2,

        Code:
        gen byte condition = (a11 > 23) & (age > 16) if !missing(a11, age)
        tab condition year, col
        You should invest some time in learning the basic Stata commands that are fundamental to all data management and analysis. Read the Getting Started [GS] and User's Guide [U] sections of the PDF manuals that came installed with your Stata. They will lead you through all the basics and introduce you to the most important commands. It will take a while, and you won't remember everything. But you will come away from it having at least a passing familiarity with all of the crucial commands, and while you will need to refer back to the help files and the manuals for details of syntax and other finer points, you will generally know what commands you need to use to perform the common tasks. The time you invest will be amply repaid in quicker solution to your problems, and less time posting on Statalist and waiting for somebody to respond!

        Added: Crossed with Joseph Coveney's response.


        Comment


        • #5
          hello, how can I find percentage of the total of the score

          Comment


          • #6
            Haroon Rasheed I don't think #5 allows more precise advice than

            1. Work out the total, either over observations or over variables as desired

            2. The percent you want is likely to be 100 * score / total

            For a better answer, follow https://www.statalist.org/forums/help#stata and give a data example showing variables and the structure of your data.

            Comment


            • #7
              I have a variable/question in the dataset “do you have cleaned drinking water” the answers are 1 for yes 0 for No. The total number of cases is 1800. 190 people answered No (0). Then the next question was asked only from these 190 respondents who answered No (0). We have four options (a), can’t afford it,( b), drinking water facility too far,( C), can’t safe enough clean water, and (d) No Government service. 90 people out of 190 answered option (d). the tab command gives me a percentage from the total of 190 questions, but I want to know the percentage of the option (d) from the total number of cases (1800). Which is in the question above. could you let me know the syntax

              Comment


              • #8
                Hello Haroon. If I follow, you can just add the missing option to your -tabulate- command for the next question. Alternatively, you could install Ben Jann's -fre- command (SSC), which gives you both percentages in the same table.

                Code:
                . // Generate some data
                . clear
                
                . set obs 1800
                number of observations (_N) was 0, now 1,800
                
                . generate byte cleanwater = _n > 190
                
                . generate byte nextquest = mod(_n,4) + 1 if cleanwater
                (190 missing values generated)
                
                .
                . tabulate cleanwater
                
                 cleanwater |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          0 |        190       10.56       10.56
                          1 |      1,610       89.44      100.00
                ------------+-----------------------------------
                      Total |      1,800      100.00
                
                . tabulate nextquest
                
                  nextquest |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          1 |        403       25.03       25.03
                          2 |        402       24.97       50.00
                          3 |        402       24.97       74.97
                          4 |        403       25.03      100.00
                ------------+-----------------------------------
                      Total |      1,610      100.00
                
                . // Use -tabulate- with the missing option
                . tabulate nextquest, missing
                
                  nextquest |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          1 |        403       22.39       22.39
                          2 |        402       22.33       44.72
                          3 |        402       22.33       67.06
                          4 |        403       22.39       89.44
                          . |        190       10.56      100.00
                ------------+-----------------------------------
                      Total |      1,800      100.00
                
                . // Alternativey, install Ben Jann's -fre- command.
                . // It shows both percentages in one table.
                . // ssc install fre // Uncomment this line to install -fre-  
                . fre cleanwater nextquest
                
                cleanwater
                -----------------------------------------------------------
                              |      Freq.    Percent      Valid       Cum.
                --------------+--------------------------------------------
                Valid   0     |        190      10.56      10.56      10.56
                        1     |       1610      89.44      89.44     100.00
                        Total |       1800     100.00     100.00           
                -----------------------------------------------------------
                
                nextquest
                -----------------------------------------------------------
                              |      Freq.    Percent      Valid       Cum.
                --------------+--------------------------------------------
                Valid   1     |        403      22.39      25.03      25.03
                        2     |        402      22.33      24.97      50.00
                        3     |        402      22.33      24.97      74.97
                        4     |        403      22.39      25.03     100.00
                        Total |       1610      89.44     100.00           
                Missing .     |        190      10.56                      
                Total         |       1800     100.00                      
                -----------------------------------------------------------

                Code without output:
                Code:
                // Generate some data
                clear
                set obs 1800
                generate byte cleanwater = _n > 190
                generate byte nextquest = mod(_n,4) + 1 if cleanwater
                
                tabulate cleanwater
                tabulate nextquest
                // Use -tabulate- with the missing option
                tabulate nextquest, missing
                // Alternativey, install Ben Jann's -fre- command.
                // It shows both percentages in one table.
                // ssc install fre // Uncomment this line to install -fre-  
                fre cleanwater nextquest

                --
                Bruce Weaver
                Email: [email protected]
                Version: Stata/MP 18.5 (Windows)

                Comment

                Working...
                X