Tabulate multiple continous variables in one table

Steve Stat

Join Date: Mar 2020
Posts: 4

Tabulate multiple continous variables in one table

10 Mar 2020, 06:15

Hello,

I am new to Stata and I have, for a few days by now, been stuck with the following problem: I am trying to create a table that lists summary statistics (mean, sd, median, freq) for multiple continuous variables where the summary statistics are calculated by Groups. In the table below I tried to illustrade what I am trying to do:

	Weight	Foot S ize	BMI
Children	mean: 40, median: 30, sd: 5, freq: 1000	mean: 15, median: 12, sd: 3, freq: 1000	...
Young Adults	mean: 80, median: 65, sd: 15, freq: 700	...	...
Elderly	mean: 60, median:55, sd: 5, freq: 500	...	...

I have been searching the Internet for hours now, but all I found in the forums were tables where the "top variables" of the table were not individual variables, but a categorical one, which is not the case for me. I have the categories on the left stored in a categorical variable but the top ones are independet variables.

FYI: As soon as we have solved this problem I actually do have another similar question: I need a second table for which the top variables are not continous but all categorical (e.g. imagine instead of a foot size in cm, we now have a categorical variable storing the foot size as S/M/L/XL/XXL and some other variables similar to this example). Obviously, I cannot do summary statistics here, but I would like to have the table Display the most common variant & the according percentage.

I hope I was able to illustrade my problem well, and I am looking forward to your suggestions! Thanks in advance!

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

10 Mar 2020, 06:32

First of all Stat may be your family name for all I know but nevertheless I flag our request to use full real names on Statalist, as explained at https://www.statalist.org/forums/help#realnames and at https://www.statalist.org/forums/help#adviceextras (which you were asked to read before posting).

it's very common to want tables that are simple enough for researchers to understand but too complicated to be the immediate product of one single command. Sometimes people with repeated particular needs are driven to write lengthy and very specific code for what they want, but that is not open to those new to Stata.

In your case the elements for one variable are given by tabstat as in this example which you can run.

Code:

. sysuse auto, clear (1978 Automobile Data) . tabstat mpg, s(mean median sd n) by(foreign) Summary for variables: mpg by categories of: foreign (Car type) foreign | mean p50 sd N ---------+---------------------------------------- Domestic | 19.82692 19 4.743297 52 Foreign | 24.77273 24.5 6.611187 22 ---------+---------------------------------------- Total | 21.2973 20 5.785503 74 --------------------------------------------------

And my guess is that copying and pasting and editing in your preferred software will take you less time than trying to find code for your own desired format. .

Naturally see the help for tabstat.

You've almost answered your second question for yourself: tabulate will show you the most frequent category.
1 like
Comment
Steve Stat

Join Date: Mar 2020

Posts: 4
#3

10 Mar 2020, 07:19

Hi Nick, thank you for such a quick answer! I didn't expect that at all! Unfortunately, I am somewhat bound to using stata for my entire analysis. In fact, I am proud to Mention that I had partially solved the Problem before, using the reshape command to basically create a new categorical variable which summarized all the ones I Need to be listed in the first table. However, this solution is not only not very Pretty, but also stops working as soon as I want to put different Kinds of variables into one overview (--> e.g. not only continuous variables, but also categorical ones, jjust like I explained in my main post).

Shouldn't there be an at least somewhat easy way to get a good overview over differences in summary statistics in one table? I mean, I basically Need exactly the table you generated with tabstat, but for multiple variables next to each other. Is there maybe any Option that I am not Aware of that can help here? Having them all in different tables makes it a lot harder to compare. It just seems like such a simple and convenient Need to me that i'm having a hard time to accept that there is no solution to this.

concerning the second Question: I would Need some Kind of command that doesnt give me that Information in a table where I have to read it out of, but instead as a single result.

Again, thanks for taking the time helping me with this, really appreciate your Input.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

10 Mar 2020, 07:36

I have been using Stata almost every day for 29 years and have used several tabulation commands and programmed some extra myself. I have met your kind of question several times before and am not surprised by it, but I can only vary slightly what I said in #1. I don't know a single command that will easily do what you want.

It's easy to think up hundreds of different table styles but no command that is at all easy to understand allows more than a few of them.

Your desired format has some resemblances to what is customary in medical or medical statistics journals, so you may get a different answer from others. It's undoubtedly programmable but that doesn't guarantee a pre-existing command.
1 like
Comment

Wouter Wakker

Join Date: Nov 2018
Posts: 621

10 Mar 2020, 07:55

I mean, I basically Need exactly the table you generated with tabstat, but for multiple variables next to each other. Is there maybe any Option that I am not Aware of that can help here?

Tabstat allows multiple variables, so this comes pretty close to what you want for your first table.

Code:

. sysuse auto, clear
(1978 Automobile Data)

. tabstat mpg weight price, by(foreign) stats(mean median sd count) nototal longstub

foreign     stats |       mpg    weight     price
------------------+------------------------------
Domestic     mean |  19.82692  3317.115  6072.423
              p50 |        19      3360    4782.5
               sd |  4.743297  695.3637  3097.104
                N |        52        52        52
------------------+------------------------------
Foreign      mean |  24.77273  2315.909  6384.682
              p50 |      24.5      2180      5759
               sd |  6.611187  433.0035  2621.915
                N |        22        22        22
-------------------------------------------------

Regarding your second question I'm on Nick's side, I don't know of any simple way to do it without programming.

Comment

Steve Stat

Join Date: Mar 2020

Posts: 4
#6

10 Mar 2020, 08:27

Amazing! That is exactly what I was Looking for! That command does exactly the Format that I designed over 40 rows of Code. I regret not asking here earlier, this could have saved me so much time, thanks a lot!

Concerning my second Question, I understand that it is too complicated to get it exactly the way I imagined it, but wouldn't it be possible to modify that command ...
tabstat mpg weight price, by(foreign) stats(mean median sd count) nototal longstub

… in a way that instead of the mean median sd and count for those continous variables, it would give me frequencies or percentages for 4 different categories of a categorical variable?

Edit: Let me rephrase the question, because it helped last time: I would basically need a crosstabulation now, but for multiple categorical variables next to each other (not inside each other) instead of just one.

Last edited by Steve Stat; 10 Mar 2020, 08:36.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#7

10 Mar 2020, 08:40

wouldn't it be possible to modify that command ...

tabstat mpg weight price, by(foreign) stats(mean median sd count) nototal longstub

in a way that instead of the mean median sd and count for those continous variables, it would give me frequencies for 4 different categories of a categorical variable?

No. Your wish is not Stata's command. You want tabstat to morph into tabulate on the fly, and it won't do that.
Comment

Announcement

Tabulate multiple continous variables in one table

Comment

Comment

Comment

Comment

Comment

Comment