Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Descriptive statistics categorical variables

    Hi there,

    I had a question about the descriptive statistics of the categorical variable education level, with 3 categories: low, middle, and high.

    I see that the mean of the 3 categories is the proportion of the sample that falls within that category. For example 12.45 % of the sample has a low education level.

    What does the standard deviation in this situation tell me? And does it add any value to the table?

    I think it's redundant, since the minimum is 0 en maximum is 1, and it can be either 0 or 1. But I could be wrong of course.

    Thank you in advance (:

    Click image for larger version

Name:	Schermafbeelding 2020-08-03 om 22.00.53.png
Views:	1
Size:	174.3 KB
ID:	1566802

  • #2
    You might be confusing the categorical variable education (only one variable), which might look like 1, 2, 3 values for the three categories, or Low, Middle, High string values, with the
    three separate dummy variables which are e.g., low = 1 if education==Low, 0 otherwise.

    Calculating summary statistics for the categorical variable does not make sense, and in fact will be impossible if it is a string variable.

    For the three dummies both the mean and the standard deviation are informative.

    Comment


    • #3
      Joro Kolev

      ​​​​​​​I'm sorry, you are correct. 1 = Low, 2 = Middle, 3 = High. The original variable was a categorical variable but I generated 3 dummy variables (low, middle, high). Then i calculated the mean and SD of the 3 dummy variables. that is what is shown in the table. I'm sorry, I should have been more clear.

      What would a standard deviation of 0.3308 tell me about the Low dummy variable?

      Comment


      • #4
        Otherwise there is non-stochastic relation between the mean of a dummy and its standard deviation, if mean(x)=p, then std(x)=sqrt(p*(1-p)). Here
        Code:
           +------------------+
             | meanx        sdx |
             |------------------|
          1. |    .1         .3 |
          2. |    .2         .4 |
          3. |    .3   .4582576 |
          4. |    .4   .4898979 |
          5. |    .5         .5 |
          6. |    .6   .4898979 |
          7. |    .7   .4582576 |
          8. |    .8         .4 |
          9. |    .9         .3 |
         10. |     1          0 |
             +------------------+

        Comment


        • #5
          Joro Kolev

          Thank you so much for the help and explanation! Have a great day (:

          Comment


          • #6
            On one hand the sd of low dummy would not tell you nothing new on the top of the mean, because in principle you can calculate it from the mean as I did in the table in #4. Here
            Code:
            . dis sqrt(.1245*(1-.1245))
            .3301511
            On the other hand your readers should not have to sit down with a hand calculator and calculate derivative statistics from your statistics. I think it is customary to report for dummies just as we report for continuous variables min, mean, sd, max.

            The standard deviation of a dummy has normal interpretation--the higher the standard deviation, the more variable is the dummy. As you can see from my table above, the standard deviation is maximised when the dummy has mean of 0.5.


            Originally posted by Sandra Bloem View Post
            Joro Kolev

            I'm sorry, you are correct. 1 = Low, 2 = Middle, 3 = High. The original variable was a categorical variable but I generated 3 dummy variables (low, middle, high). Then i calculated the mean and SD of the 3 dummy variables. that is what is shown in the table. I'm sorry, I should have been more clear.

            What would a standard deviation of 0.3308 tell me about the Low dummy variable?

            Comment


            • #7
              You are welcome, Sandra !

              But I just started rolling, I want to say more stuff :-).

              Another added benefit when you report for dummies min mean sd max, is that the reader can figure out that these are dummies by one look at your table of summary statistics for all your variables. The reader does not need to search through the text of your paper to figure out which variable is of what type.

              Now I think I am done :-)

              Originally posted by Sandra Bloem View Post
              Joro Kolev

              Thank you so much for the help and explanation! Have a great day (:

              Comment


              • #8
                Joro Kolev

                Haha, the more information the better (: thank you so much for taking your time to help me, I appreciate it a lot.

                Comment

                Working...
                X