How to include frequency, means, standard deviation and other statistics for some variables (like categorical variables)?

Bright Tree

Join Date: Mar 2020

Posts: 85
#1

How to include frequency, means, standard deviation and other statistics for some variables (like categorical variables)?

25 May 2020, 14:22

Dear friends,

Hi! I would like to create summary statistics for some variable, including their frequency, mean, standard deviation. I used tab1 for one way tabulation,

Code:

sysuse auto, clear tab1 rep78 foreign

But I can't get other statistics. I know about "tabstat...,by(group)". But It was grouped into categories. I hope I might do it in this way,

Code:

tab1 rep78 foreign, means std max min

Thank you very much!

Last edited by Bright Tree; 25 May 2020, 14:25.
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

25 May 2020, 15:36

If you wish to summarize a continuous variable according to a categorical variable, just type: tabulate catvar, summarize(continuousvar).

That said. I fail to understand the reason of getting mean values plus SDs of categorical variables.

Best regards,

Marcos
Comment
Bright Tree

Join Date: Mar 2020

Posts: 85
#3

25 May 2020, 17:35

Dear Professor, thank you so much for your great help! I would like to display the summary statistics for the variables as well as the frequency for categorical variables.

Code:

sysuse auto, clear tab1 rep78 foreign // frequency tabstat rep78 foreign, stat(mean) // mean // Could I combine them

Last edited by Bright Tree; 25 May 2020, 17:51.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

26 May 2020, 03:34

In #3, there is still the demand for a mean value. As previously remarked, mean and SD are not what we should expect when tabulating categorical variables. That said, only when we have binary variables ( but it is not your example) the mean will convey the proportion of data the in ‘1’ category.

Best regards,

Marcos
Comment
Bright Tree

Join Date: Mar 2020

Posts: 85
#5

26 May 2020, 06:21

Dear Professor, well! Thank you!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

26 May 2020, 07:25

The mean of an indicator (0,1) (some say dummy variable) is perfectly intelligible as the proportion that is the state coded 1. So in the auto data, the mean of foreign is the fraction of foreign cars.

The mean of an ordinal (ordered, graded) variable is what it is. In any university I know about people in some departments or schools (especially psychology or sociology) explain that you should not take means of anything ordinal -- because the measurement scale does not justify averaging -- regardless of the fact that university policy is to do precisely that in summarizing students' grades. (If I grade one submission 80% there is nothing that makes such work precisely twice as good as one graded 40%; the percent marks are just ordered conventionally, even if the convention permits any distinct integer to be reported.)

The mean of a nominal variable with arbitrary codes is, broadly speaking, nonsense. If "frog" "newt" "toad" are coded 1, 2, 3 or 3,2,1 or whatever different means are (almost) inevitable depending capriciously on the coding used -- and such means are (usually) meaningless (pun intended, as typically). (The exceptions are special cases, if all beasts are frogs, then the same code is recorded again and again and the "mean" will echo the data.)

That said, I see some point in the combined table being sought by Bright Tree : it is programmable in Stata but I don't think anyone has written a command to do it.
1 like
Comment
Bright Tree

Join Date: Mar 2020

Posts: 85
#7

27 May 2020, 17:56

Dear Professor Cox, thank you so much for your advice!
Comment

Announcement

How to include frequency, means, standard deviation and other statistics for some variables (like categorical variables)?

Comment

Comment

Comment

Comment

Comment

Comment