Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cross-tabulation for two variables, by two groups, summarize mean and sd

    I have some variable look like this.
    city gender age income
    A Female 23 309
    A Male 14 456
    A Female 67 342
    A Female 45 653
    B Male 42 155
    B Female 24 678
    B Female 35 256
    B Male 49 432
    And I want to create cross-tabulation for age and income, by group city and gender, summarizing mean and sd
    like this:
    female male
    age income age income
    A mean
    sd
    B
    I tried tabulate,
    tabulate gender city, summarize(age) nofreq, wrap
    it works well but can only summarize one variable at once. I can't do something like summarize(age income)

    I also tried tabstat
    tabstat age income, by(city) stat(mean sd)
    it can only tabulate one category as by(city). I can't do something like by(city gender)

    I finally tried table
    table city gender, contents(mean age sd age mean age sd age)
    It returns the numbers I want, but in an ugly display since it put all the statistics in one cell without notation.

    like
    female male
    A mean(age)
    sd(age)
    mean(income)
    sd(income)
    B
    I also read about tabout and tablemat. And a little bit confused now.
    It seems that tabout just put those tables in a beautiful look but actually.
    And tablemat keeps alerting me invalid name, or asking me "Please provide only one statistic at a time"
    tablemat age income, stat(mean sd) by(city gender)
    When I tried put only one statistic
    tablemat age income, stat(mean) by(city gender)
    tablemat provides me something definitely not what I want.
    All city_A city_B gender_F gender_M
    age 37.375 37.25 37.5 38.8 35
    income 410.125 440 80.25 447.6 347.6667

    Please help me with this problem.
    I'm running out of solutions.

    Any help will be highly appreciated.
    Thanks a lot!
    Last edited by Xianli Wang; 14 Mar 2016, 21:05.

  • #2
    This is not exactly what you are looking for, but you can use the by option with table, something like this:
    Code:
    table city, c(mean age sd age mean age sd age) f(%6.2fc) by(gender)

    Comment


    • #3
      One solution is to create a composite variable first:

      Code:
      egen group = group(city gender), label

      Comment


      • #4
        Originally posted by Oded Mcdossi View Post
        This is not exactly what you are looking for, but you can use the by option with table, something like this:
        Code:
        table city, c(mean age sd age mean age sd age) f(%6.2fc) by(gender)
        Thanks Oded. This is an alternative although still not what I want.
        I still want to put one group in the row, and one group in the column so that readers can look up for group combinations easier.
        Do you have any ideas about it?
        Thanks for your help!
        Last edited by Xianli Wang; 15 Mar 2016, 23:10.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          One solution is to create a composite variable first:

          Code:
          egen group = group(city gender), label
          Thanks Nick!
          I did this and got something looks like
          group age income
          A Female mean
          sd
          A Male mean
          sd
          B Female mean
          sd
          B Male mean
          sd
          Thanks a lot! It's a good way but within each group, my data has much more variables than just two, like female and male.
          This table is a little bit hard to find the group combination. I'm still thinking about putting one group as the row, and the other as the column.
          Do you have any ideas about it?

          Thanks for your help!

          Comment


          • #6
            Originally posted by Nick Cox View Post
            One solution is to create a composite variable first:

            Code:
            egen group = group(city gender), label
            Hi Team
            , I a beginner with STATA and i have a dataset with 1.3million observations, spread over years from 2011 to 2020. i wanted to summarize two variables, one (client) by another (year). I tried this command egen group = group( Client year), label and i haven't seen any output, have been waiting since, could the problem be my data volume or wrong command?

            Comment

            Working...
            X