Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Cohen's d for proportion test - after "prtest"

    Hi all,

    I am testing proportion differences between two independent samples, e.g., the proportion of crime rates between servicemen group1 and servicemen group2, using "prtest" command in Stata. Then I need to provide effect size.

    I wonder if I can use cohens'd, followed by the "prtest" command. I normally use cohen's d when testing mean differences (after t-test), can it be used for binary outcome variables?


    Code:
    estpost prtest sm_convict, by(sm_group)
    
                 |      e(b)   e(count)      e(se)     e(se0)       e(z)     e(p_l)       e(p)     e(p_u)     e(N_1) 
    -------------+---------------------------------------------------------------------------------------------------
     sm_convict |  .0075073      4645   .0084651   .0084559   .8878214   .8126816   .3746369   .1873184       2249 
    
                 |    e(P_1)     e(N_2)     e(P_2) 
    -------------+---------------------------------
      sm_convict |  .0951534      2396   .0876461 
    
    esize twosample sm_convict, by(sm_group) cohensd
    
    Effect size based on mean comparison
    
                                   Obs per group:
                                spouse responded =      2,249
                                   spouse didn't =      2,396
    ---------------------------------------------------------
            Effect Size |   Estimate     [95% Conf. Interval]
    --------------------+------------------------------------
              Cohen's d |    .026063   -.0314853    .0836084
    ---------------------------------------------------------
    In this case, can I report the difference between two sample is 0.75 (9.5% vs. 8.8%), and the effect size is 0.03 (very small)?

    Thank you!

    Maggie

  • #2
    I suppose you could do this, but I wouldn't. The purpose of calculating Cohen's d is to overcome the fact that continuous variable distributions, even when of the same shape, can differ in both location and scale, so that the same difference in means could be either large or small, depending on the variation. But probabilities don't have that problem, and in fact, the variance of a probability is a simple function of the probability itself. Everybody can see that the difference between a probability of 0.095 and 0.088 is small; you don't have to provide a context for that. So I would just report the difference between the probabilities; it's a sufficient measure of the effect size on its own.

    Comment


    • #3
      Like Clyde, I was thinking that the risk difference would make more sense. If I deciphered the output in #1 correctly, this ought to give you what you need:

      Code:
      . csi 214 210 2035 2186, or
      
                       |   Exposed   Unexposed  |      Total
      -----------------+------------------------+------------
                 Cases |       214         210  |        424
              Noncases |      2035        2186  |       4221
      -----------------+------------------------+------------
                 Total |      2249        2396  |       4645
                       |                        |
                  Risk |  .0951534    .0876461  |   .0912809
                       |                        |
                       |      Point estimate    |    [95% Conf. Interval]
                       |------------------------+------------------------
       Risk difference |         .0075073       |   -.0090839    .0240986
            Risk ratio |         1.085655       |    .9054815     1.30168
       Attr. frac. ex. |         .0788971       |   -.1043848    .2317618
       Attr. frac. pop |         .0398207       |
            Odds ratio |         1.094662       |     .896707    1.336318 (Cornfield)
                       +-------------------------------------------------
                                     chi2(1) =     0.79  Pr>chi2 = 0.3746
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        Thanks all! I agree for proportions, the difference in % itself should be good enough.

        However, effect size is requested for both continuous and binary outcomes by our clients, and the outcomes are all in one table. So to be consistent, we kind off need to report effect size for binary outcomes.

        I am not familiar with risk difference. I can see chi-square test was used here. Can I use chi-square and Cramér's V then for my binary (and categorical) variables?

        Thank you!

        Comment


        • #5
          Chi-square and Cramer's V are not effect sizes. Chi-square is not an effect size because it is also sensitive to the sample size. An effect size must be a pure measure of the difference between the groups, and not depend on sample size.

          Cramer's V is like a correlation coefficient between the grouping variable and the dichotomous outcome variable. It is, in a (different) sense, an effect size, but of a different effect. In particular, it is not a measure of the size of the group difference.

          The difference between the probabilities in the groups is an effect size: it directly and transparently describes the extent of the difference between the groups in a manner that is scale-free and independent of sample size. That is the very essence of an effect size measure. And it has the advantage of being far more understandable than any other.

          I don't think any reasonable client will expect that the same approach to effect sizes will be taken with continuous and categorical variables--that won't make sense. Even in just the descriptive statistics, you normally present mean and s.d. (or median and some other percentile range) for continuous variables but report n's and percents for category variables. So it is, similarly, perfectly reasonable to use Cohen's d for continuous variables and report the probability difference between groups as the effect size for category variables. You can put a footnote in your table explaining that this is what you are doing.

          Comment


          • #6
            Thanks for your explanation Clyde.

            What about the "effect size" after marginal effect (binary outcomes) and logit regression?

            For example, we test the adjusted difference between (adjust for age and length of relationship) spouses of servicemen who experienced miscarriage by servicemen groups. The adjusted difference is 4.7 per cent points higher in group 1. How do I calculate effect size for this adjusted difference, or do we just report 4.7 as the adjusted proportion difference? is 4.7 a big difference?

            Code:
            . svy:logit  sp_miscr  sp_vvgroup sp_age lengthtime_cu , or
            (running logit on estimation sample)
            
            Survey: Logistic regression
            
            Number of strata   =         1                  Number of obs     =      1,900
            Number of PSUs     =     1,900                  Population size   = 3,817.9277
                                                            Design df         =      1,899
                                                            F(   3,   1897)   =       1.57
                                                            Prob > F          =     0.1950
            
            -------------------------------------------------------------------------------
                          |             Linearized
                 sp_miscr | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
               sp_vvgroup |   1.255239   .1651192     1.73   0.084     .9698056    1.624682
                   sp_age |   .9759781   .0144659    -1.64   0.101     .9480157    1.004765
            lengthtime_cu |   1.006095   .0067109     0.91   0.362     .9930188    1.019343
                    _cons |   1.366571   1.205483     0.35   0.723      .242264    7.708594
            -------------------------------------------------------------------------------
            Note: _cons estimates baseline odds.
            
            . mfx
            
            Marginal effects after svy:logit
                  y  = Pr(sp_miscr) (predict)
                     =  .29144931
            ------------------------------------------------------------------------------
            variable |      dy/dx    Std. Err.     z    P>|z|  [    95% C.I.   ]      X
            ---------+--------------------------------------------------------------------
            sp_vvg~p*|   .0470662      .02673    1.76   0.078  -.005322  .099455   .467024
              sp_age |  -.0050212      .00305   -1.65   0.099  -.010994  .000952   63.0716
            length~u |   .0012548      .00138    0.91   0.362  -.001446  .003955   37.3211
            ------------------------------------------------------------------------------
            (*) dy/dx is for discrete change of dummy variable from 0 to 1

            Comment


            • #7
              How do I calculate effect size for this adjusted difference, or do we just report 4.7 as the adjusted proportion difference?
              Same as before, percentages have an implicit and constant scale from 0 to 100 so just report 4.7 percentage points and you have it.

              is 4.7 a big difference?
              I don't know. It depends on what these variables are. If its a 4.7 percentage point difference in the probability a plane will crash, then yes, it's huge. If it's a 4.7 percentage point difference in the probability that a plane takes off on time, it's probably only a small difference.

              You know, the rules of thumb about small, medium, and large effect sizes for Cohen's d are just guidelines. And Cohen's d was developed with variables that are unfamiliar and have no intrinsic units, often psychological measurements that are really just attempts to instrument unobservable states. So the idea that effect sizes could be judged relative to the scale of variation in the response was reasonable, because there was no other standard against which to judge them. But this mechanism is not needed when the variable has a natural and easily understood scale of its own. But that still leaves a degree of judgment involved when you want to describe an effect as small, medium, or large (or some other such adjective).

              Comment


              • #8
                Ok, thanks for your help Clyde.

                Comment

                Working...
                X