
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can tabstat or any other command report row means easily

    Dear Stata users,

    I usually use tabstat to get summary statistics of varlists. It performs very well. However, sometimes I also want to get row mean of varlists besides means of separate variables. So, (without resorting to egen, mean() function and generating a new variable) is there any command could do this?

    * Example generated by -dataex-. To install: ssc install dataex
    input float(gnp1970 gnp1980 gnp1990) byte quarter
      3922   5310 6830.4 1
    3922.3 5190.1 6853.2 2
    3961.3 5179.2 6837.5 3
      3931 5253.7 6804.6 4
    4027.3 5330.3 6747.1 1
      4042 5306.6 6766.9 2
    4064.7 5341.8 6781.2 3
    4072.9 5286.2   6821 4
    4149.8 5206.1 6883.4 1
    4232.5 5236.8 6937.5 2
    4272.8 5212.8 6980.2 3
    4343.9 5221.7 7067.7 4
    4439.6   5255 7065.9 1
    4475.9 5365.6 7096.8 2
    4471.4 5448.3 7118.6 3
    4495.1 5540.5   7213 4
    4475.4 5641.4 7278.2 1
    4492.4 5707.5 7369.2 2
    4442.4 5749.5 7403.2 3
    4430.2 5787.3 7494.8 4
    4361.7 5818.1   7519 1
    4403.2 5870.3 7531.4 2
    4471.7 5954.9 7572.7 3
    4531.8 5996.7 7645.4 4
    4620.6 6038.3 7703.3 1
    4655.6 6042.6 7819.6 2
      4677 6097.5 7853.8 3
    4720.7   6126 7948.2 4
    4791.9 6157.2 8024.3 1
    4852.2   6221 8148.8 2
    4924.3 6266.3 8233.2 3
    4922.2   6372 8289.6 4
    4958.8 6423.8 8432.1 1
    5107.7 6485.4 8476.3 2
    5155.2 6504.8   8560 3
    5227.4   6581 8731.6 4
    5236.7   6668 8843.8 1
    5244.9 6691.3 8910.8 2
    5280.1 6727.1 9031.1 3
    5296.6 6748.5 9204.7 4
    . tabstat gnp1970 gnp1980 gnp1990, by(quarter)
    Summary statistics: mean
      by categories of: quarter 
     quarter |   gnp1970   gnp1980   gnp1990  rowmeanof70/80/90
           1 |   4498.38   5784.82   7532.75  5938.65
           2 |   4542.87   5811.72   7591.05  5981.88
           3 |   4572.09   5848.22   7637.15  6019.15
           4 |   4597.18   5891.36   7722.06  6070.20
       Total |   4552.63   5834.03  7620.752

  • #2
    How about something like this:

    gen id=_n
    reshape long gnp, i(id) j(year)
    gen gnpmean=.
    foreach num of numlist 1 2 3 4 {
      summ gnp if quarter==`num', meanonly
      replace gnpmean=r(mean) if quarter==`num'
    reshape wide
    tabstat gnp*, by(quarter)


    • #3
      The code in #2 will work, and if the data set isn't very large, the execution time required for the -reshape-s will be acceptable. But there is no need to slow things down still more by using a loop over quarters to get the gnp mean variable.

      gen id=_n
      reshape long gnp, i(id) j(year)
      by quarter, sort: egen gnpmean = mean(gnp)
      reshape wide
      tabstat gnp*, by(quarter)
      That said, I'm imagining that Chen Samulsion was hoping for something a bit more quick, direct, and simple. In fact, my guess is that he would prefer using -egen, rowmean()- and avoiding two -reshape-s to this approach. I think he was hoping to avoid creating any new variable, having the table-writing command handle it internally. Off hand, I don't know any way to do that. But there are several user-written commands for making tables, and it is likely that one of them can do it. Perhaps somebody familiar with one will see this thread and respond.


      • #4
        Alan Neustadtl Clyde Schechter thank you very much.
        That said, I'm imagining that Chen Samulsion was hoping for something a bit more quick, direct, and simple. In fact, my guess is that he would prefer using -egen, rowmean()- and avoiding two -reshape-s to this approach. I think he was hoping to avoid creating any new variable, having the table-writing command handle it internally.
        Clyde make it clear what I thought. Perhaps I should not be so lazy in doing non-routine work (always reap without sowing in Stata?). All in all, I learn much from both of you.


        • #5
          No one has quite said this yet but the data example in #1 shows a layout that for most Stata purposes should be reshaped to long and kept that way. That layout has very few advantages and many disadvantages.


          • #6
            The data I showed in #1 was transformed from shipped dataset gnp96 which is indeed organized in long form.


            • #7
              I don't quite follow #6. If you're saying that you created a simple example to show the problem, then that makes sense and indeed is helpful.


              • #8
                Yes, Nick. I created data example from gnp96, aiming to show my problem. The original gnp96.dta is organized in long form. As you said, long form is the best choice in this case, and perhaps that is why Stata shipped it herewith.

