Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tabulate multiple continous variables in one table

    Hello,

    I am new to Stata and I have, for a few days by now, been stuck with the following problem: I am trying to create a table that lists summary statistics (mean, sd, median, freq) for multiple continuous variables where the summary statistics are calculated by Groups. In the table below I tried to illustrade what I am trying to do:
    Weight Foot S
    ize
    BMI
    Children mean: 40, median: 30, sd: 5, freq: 1000
    mean: 15, median: 12, sd: 3, freq: 1000
    ...
    Young Adults
    mean: 80, median: 65, sd: 15, freq: 700
    ... ...
    Elderly
    mean: 60, median:55, sd: 5, freq: 500
    ... ...
    I have been searching the Internet for hours now, but all I found in the forums were tables where the "top variables" of the table were not individual variables, but a categorical one, which is not the case for me. I have the categories on the left stored in a categorical variable but the top ones are independet variables.


    FYI: As soon as we have solved this problem I actually do have another similar question: I need a second table for which the top variables are not continous but all categorical (e.g. imagine instead of a foot size in cm, we now have a categorical variable storing the foot size as S/M/L/XL/XXL and some other variables similar to this example). Obviously, I cannot do summary statistics here, but I would like to have the table Display the most common variant & the according percentage.

    I hope I was able to illustrade my problem well, and I am looking forward to your suggestions! Thanks in advance!

  • #2
    First of all Stat may be your family name for all I know but nevertheless I flag our request to use full real names on Statalist, as explained at https://www.statalist.org/forums/help#realnames and at https://www.statalist.org/forums/help#adviceextras (which you were asked to read before posting).

    it's very common to want tables that are simple enough for researchers to understand but too complicated to be the immediate product of one single command. Sometimes people with repeated particular needs are driven to write lengthy and very specific code for what they want, but that is not open to those new to Stata.

    In your case the elements for one variable are given by tabstat as in this example which you can run.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . tabstat mpg, s(mean median sd n) by(foreign)
    
    Summary for variables: mpg
         by categories of: foreign (Car type)
    
     foreign |      mean       p50        sd         N
    ---------+----------------------------------------
    Domestic |  19.82692        19  4.743297        52
     Foreign |  24.77273      24.5  6.611187        22
    ---------+----------------------------------------
       Total |   21.2973        20  5.785503        74
    --------------------------------------------------
    And my guess is that copying and pasting and editing in your preferred software will take you less time than trying to find code for your own desired format. .

    Naturally see the help for tabstat.

    You've almost answered your second question for yourself: tabulate will show you the most frequent category.

    Comment


    • #3
      Hi Nick, thank you for such a quick answer! I didn't expect that at all! Unfortunately, I am somewhat bound to using stata for my entire analysis. In fact, I am proud to Mention that I had partially solved the Problem before, using the reshape command to basically create a new categorical variable which summarized all the ones I Need to be listed in the first table. However, this solution is not only not very Pretty, but also stops working as soon as I want to put different Kinds of variables into one overview (--> e.g. not only continuous variables, but also categorical ones, jjust like I explained in my main post).

      Shouldn't there be an at least somewhat easy way to get a good overview over differences in summary statistics in one table? I mean, I basically Need exactly the table you generated with tabstat, but for multiple variables next to each other. Is there maybe any Option that I am not Aware of that can help here? Having them all in different tables makes it a lot harder to compare. It just seems like such a simple and convenient Need to me that i'm having a hard time to accept that there is no solution to this.

      concerning the second Question: I would Need some Kind of command that doesnt give me that Information in a table where I have to read it out of, but instead as a single result.

      Again, thanks for taking the time helping me with this, really appreciate your Input.

      Comment


      • #4
        I have been using Stata almost every day for 29 years and have used several tabulation commands and programmed some extra myself. I have met your kind of question several times before and am not surprised by it, but I can only vary slightly what I said in #1. I don't know a single command that will easily do what you want.

        It's easy to think up hundreds of different table styles but no command that is at all easy to understand allows more than a few of them.

        Your desired format has some resemblances to what is customary in medical or medical statistics journals, so you may get a different answer from others. It's undoubtedly programmable but that doesn't guarantee a pre-existing command.

        Comment


        • #5
          I mean, I basically Need exactly the table you generated with tabstat, but for multiple variables next to each other. Is there maybe any Option that I am not Aware of that can help here?
          Tabstat allows multiple variables, so this comes pretty close to what you want for your first table.
          Code:
          . sysuse auto, clear
          (1978 Automobile Data)
          
          . tabstat mpg weight price, by(foreign) stats(mean median sd count) nototal longstub
          
          foreign     stats |       mpg    weight     price
          ------------------+------------------------------
          Domestic     mean |  19.82692  3317.115  6072.423
                        p50 |        19      3360    4782.5
                         sd |  4.743297  695.3637  3097.104
                          N |        52        52        52
          ------------------+------------------------------
          Foreign      mean |  24.77273  2315.909  6384.682
                        p50 |      24.5      2180      5759
                         sd |  6.611187  433.0035  2621.915
                          N |        22        22        22
          -------------------------------------------------
          Regarding your second question I'm on Nick's side, I don't know of any simple way to do it without programming.

          Comment


          • #6
            Amazing! That is exactly what I was Looking for! That command does exactly the Format that I designed over 40 rows of Code. I regret not asking here earlier, this could have saved me so much time, thanks a lot!

            Concerning my second Question, I understand that it is too complicated to get it exactly the way I imagined it, but wouldn't it be possible to modify that command ...
            tabstat mpg weight price, by(foreign) stats(mean median sd count) nototal longstub

            … in a way that instead of the mean median sd and count for those continous variables, it would give me frequencies or percentages for 4 different categories of a categorical variable?



            Edit: Let me rephrase the question, because it helped last time: I would basically need a crosstabulation now, but for multiple categorical variables next to each other (not inside each other) instead of just one.
            Last edited by Steve Stat; 10 Mar 2020, 08:36.

            Comment


            • #7
              wouldn't it be possible to modify that command ...

              tabstat mpg weight price, by(foreign) stats(mean median sd count) nototal longstub

              in a way that instead of the mean median sd and count for those continous variables, it would give me frequencies for 4 different categories of a categorical variable?
              No. Your wish is not Stata's command. You want tabstat to morph into tabulate on the fly, and it won't do that.

              Comment

              Working...
              X