Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Showing a different mean in tabstat


    I tabulated in my data type of error cases by the number of children. After using sum and N along with mean in tabstat, I realized I don't want this way of mean calculation.
    Code:
    tabstat b_err, by( kidsnum) stat(sum N mean)
    Click image for larger version

Name:	tabstat.png
Views:	1
Size:	9.3 KB
ID:	1642594
    Ideally, I needed the mean to be calculated of the total number of error cases which is shown by the total cases of sum. So, for example for the no children category, (144/445) instead of how it calculated here (144/818), how to adjust this?
    Last edited by Hend She; 28 Dec 2021, 05:09.

  • #2
    Hend:
    do you mean something along the following lines (ie from Table 1 to Table 2)?
    Code:
    . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
    (1978 automobile data)
    
    . tabstat price, by(foreign)
    
    Summary for variables: price
    Group variable: foreign (Car origin)
    
     foreign |      Mean
    ---------+----------
    Domestic |  6072.423
     Foreign |  6384.682
    ---------+----------
       Total |  6165.257
    --------------------
    
    . tabstat price
    
        Variable |      Mean
    -------------+----------
           price |  6165.257
    ------------------------
    
    .
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hi Carlo, many thanks for your help! I needed something else, I need it to be shown as an average (i.e. error cases average) for each category respectively disaggregated by that variable (i.e. the number of children).

      The way the mean above is calculated is by dividing Sum/N for each category, what I need instead is that it would be shown in terms of the total number of error cases, that is mainly by having each category divided over (total) sum 'in that sum column'

      Comment


      • #4
        Hend:
        do you mean something along the following toy-example?
        Code:
        use "C:\Program Files\Stata17\ado\base\a\auto.dta"
        
        . bysort rep78: egen wanted=mean(price)
        
        . tab wanted
        
             wanted |      Freq.     Percent        Cum.
        ------------+-----------------------------------
             4564.5 |          2        2.70        2.70
               5913 |         11       14.86       17.57
           5967.625 |          8       10.81       28.38
             6071.5 |         18       24.32       52.70
           6429.233 |         30       40.54       93.24
             6430.4 |          5        6.76      100.00
        ------------+-----------------------------------
              Total |         74      100.00
        
        .
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Thanks, Carlo! I think I need the ratio expressed in terms of the total cases
          Code:
           tabstat price, by( foreign) stat(sum N mean)
          
          Summary for variables: price
          Group variable: foreign (Car origin)
          
           foreign |       Sum         N      Mean
          ---------+------------------------------
          Domestic |    315766        52  6072.423
           Foreign |    140463        22  6384.682
          ---------+------------------------------
             Total |    456229        74  6165.257
          ----------------------------------------
          
          . di 315766 /456229
          .69212172

          Comment


          • #6
            Hend:
            something similar?
            Code:
            . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
            (1978 automobile data)
            
            . bysort rep78: egen wanted_1=total(price)
            
            . egen wanted_2=total(price)
            
            . gen wanted_3=wanted_1/wanted_2
            
            . bysort rep78: list rep78 wanted_1 wanted_2 wanted_3 if _n==1 & rep78!=.
            
            ----------------------------------------------------------------------------------------------------------------------------------------
            -> rep78 = 1
            
                 +----------------------------------------+
                 | rep78   wanted_1   wanted_2   wanted_3 |
                 |----------------------------------------|
              1. |     1       9129     456229   .0200097 |
                 +----------------------------------------+
            
            ----------------------------------------------------------------------------------------------------------------------------------------
            -> rep78 = 2
            
                 +----------------------------------------+
                 | rep78   wanted_1   wanted_2   wanted_3 |
                 |----------------------------------------|
              1. |     2      47741     456229   .1046426 |
                 +----------------------------------------+
            
            ----------------------------------------------------------------------------------------------------------------------------------------
            -> rep78 = 3
            
                 +----------------------------------------+
                 | rep78   wanted_1   wanted_2   wanted_3 |
                 |----------------------------------------|
              1. |     3     192877     456229   .4227636 |
                 +----------------------------------------+
            
            ----------------------------------------------------------------------------------------------------------------------------------------
            -> rep78 = 4
            
                 +----------------------------------------+
                 | rep78   wanted_1   wanted_2   wanted_3 |
                 |----------------------------------------|
              1. |     4     109287     456229   .2395442 |
                 +----------------------------------------+
            
            ----------------------------------------------------------------------------------------------------------------------------------------
            -> rep78 = 5
            
                 +----------------------------------------+
                 | rep78   wanted_1   wanted_2   wanted_3 |
                 |----------------------------------------|
              1. |     5      65043     456229   .1425666 |
                 +----------------------------------------+
            
            ----------------------------------------------------------------------------------------------------------------------------------------
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              From that example I think you want the "share of b_err by number of children in family", so not any sort of mean at all. Does this work:

              Code:
              egen total=sum(b_err)
              sort kidsnum
              by kidsnum: egen cell=sum(b_err)
              by kidsnum: keep if _n==1
              gen share=cell/total
              list
              There is a long essay by Cox related to this at https://www.stata.com/statalist/arch.../msg00385.html where he condemns this method for its inefficiency. The efficient alternative is to use table or tabulate with weights, but it is difficult to know exactly what the different types of weights will actually do, as demonstrated by the fact that it takes a long essay to explicate the details.

              Comment


              • #8
                Dear Carlo & Feenberg, many thanks for your help! That is very useful, I will use these techniques.

                I thought there is a direct way that I am unaware of, but it seems there is no straightforward technique for this. Also, for tabstat to show the Sum for each category over the total Sum in a direct way (i.e. freq. per category over total frequencies)
                @Feenberg, unfrotunately, I applied it but it didn't work. I tried typing list after that code but I got a huge output table as I have many variables. I ended up with four observations in the b_err variable where this binary variable only indicates a zero in this case.

                Comment

                Working...
                X