Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to include frequency, means, standard deviation and other statistics for some variables (like categorical variables)?

    Dear friends,

    Hi! I would like to create summary statistics for some variable, including their frequency, mean, standard deviation. I used tab1 for one way tabulation,
    Code:
     sysuse auto, clear
      tab1 rep78 foreign

    But I can't get other statistics. I know about "tabstat...,by(group)". But It was grouped into categories. I hope I might do it in this way,

    Code:
       tab1 rep78 foreign, means std max min
    Thank you very much!
    Last edited by Bright Tree; 25 May 2020, 14:25.

  • #2
    If you wish to summarize a continuous variable according to a categorical variable, just type: tabulate catvar, summarize(continuousvar).

    That said. I fail to understand the reason of getting mean values plus SDs of categorical variables.
    Best regards,

    Marcos

    Comment


    • #3
      Dear Professor, thank you so much for your great help! I would like to display the summary statistics for the variables as well as the frequency for categorical variables.
      Code:
      sysuse auto, clear 
       tab1 rep78 foreign // frequency   tabstat  rep78 foreign, stat(mean) // mean  // Could I combine them
      Last edited by Bright Tree; 25 May 2020, 17:51.

      Comment


      • #4
        In #3, there is still the demand for a mean value. As previously remarked, mean and SD are not what we should expect when tabulating categorical variables. That said, only when we have binary variables ( but it is not your example) the mean will convey the proportion of data the in ‘1’ category.
        Best regards,

        Marcos

        Comment


        • #5
          Dear Professor, well! Thank you!

          Comment


          • #6
            The mean of an indicator (0,1) (some say dummy variable) is perfectly intelligible as the proportion that is the state coded 1. So in the auto data, the mean of foreign is the fraction of foreign cars.

            The mean of an ordinal (ordered, graded) variable is what it is. In any university I know about people in some departments or schools (especially psychology or sociology) explain that you should not take means of anything ordinal -- because the measurement scale does not justify averaging -- regardless of the fact that university policy is to do precisely that in summarizing students' grades. (If I grade one submission 80% there is nothing that makes such work precisely twice as good as one graded 40%; the percent marks are just ordered conventionally, even if the convention permits any distinct integer to be reported.)

            The mean of a nominal variable with arbitrary codes is, broadly speaking, nonsense. If "frog" "newt" "toad" are coded 1, 2, 3 or 3,2,1 or whatever different means are (almost) inevitable depending capriciously on the coding used -- and such means are (usually) meaningless (pun intended, as typically). (The exceptions are special cases, if all beasts are frogs, then the same code is recorded again and again and the "mean" will echo the data.)

            That said, I see some point in the combined table being sought by Bright Tree : it is programmable in Stata but I don't think anyone has written a command to do it.

            Comment


            • #7
              Dear Professor Cox, thank you so much for your advice!

              Comment

              Working...
              X