Using Cohen's d for proportion test - after "prtest"

Maggie Yu

Join Date: Oct 2018
Posts: 20

Using Cohen's d for proportion test - after "prtest"

02 Jan 2019, 16:10

Hi all,

I am testing proportion differences between two independent samples, e.g., the proportion of crime rates between servicemen group1 and servicemen group2, using "prtest" command in Stata. Then I need to provide effect size.

I wonder if I can use cohens'd, followed by the "prtest" command. I normally use cohen's d when testing mean differences (after t-test), can it be used for binary outcome variables?

Code:

estpost prtest sm_convict, by(sm_group)

             |      e(b)   e(count)      e(se)     e(se0)       e(z)     e(p_l)       e(p)     e(p_u)     e(N_1) 
-------------+---------------------------------------------------------------------------------------------------
 sm_convict |  .0075073      4645   .0084651   .0084559   .8878214   .8126816   .3746369   .1873184       2249 

             |    e(P_1)     e(N_2)     e(P_2) 
-------------+---------------------------------
  sm_convict |  .0951534      2396   .0876461 

esize twosample sm_convict, by(sm_group) cohensd

Effect size based on mean comparison

                               Obs per group:
                            spouse responded =      2,249
                               spouse didn't =      2,396
---------------------------------------------------------
        Effect Size |   Estimate     [95% Conf. Interval]
--------------------+------------------------------------
          Cohen's d |    .026063   -.0314853    .0836084
---------------------------------------------------------

In this case, can I report the difference between two sample is 0.75 (9.5% vs. 8.8%), and the effect size is 0.03 (very small)?

Thank you!

Maggie

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

02 Jan 2019, 17:48

I suppose you could do this, but I wouldn't. The purpose of calculating Cohen's d is to overcome the fact that continuous variable distributions, even when of the same shape, can differ in both location and scale, so that the same difference in means could be either large or small, depending on the variation. But probabilities don't have that problem, and in fact, the variance of a probability is a simple function of the probability itself. Everybody can see that the difference between a probability of 0.095 and 0.088 is small; you don't have to provide a context for that. So I would just report the difference between the probabilities; it's a sufficient measure of the effect size on its own.
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1132

02 Jan 2019, 17:52

Like Clyde, I was thinking that the risk difference would make more sense. If I deciphered the output in #1 correctly, this ought to give you what you need:

Code:

. csi 214 210 2035 2186, or

                 |   Exposed   Unexposed  |      Total
-----------------+------------------------+------------
           Cases |       214         210  |        424
        Noncases |      2035        2186  |       4221
-----------------+------------------------+------------
           Total |      2249        2396  |       4645
                 |                        |
            Risk |  .0951534    .0876461  |   .0912809
                 |                        |
                 |      Point estimate    |    [95% Conf. Interval]
                 |------------------------+------------------------
 Risk difference |         .0075073       |   -.0090839    .0240986
      Risk ratio |         1.085655       |    .9054815     1.30168
 Attr. frac. ex. |         .0788971       |   -.1043848    .2317618
 Attr. frac. pop |         .0398207       |
      Odds ratio |         1.094662       |     .896707    1.336318 (Cornfield)
                 +-------------------------------------------------
                               chi2(1) =     0.79  Pr>chi2 = 0.3746

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Maggie Yu

Join Date: Oct 2018

Posts: 20
#4

02 Jan 2019, 18:56

Thanks all! I agree for proportions, the difference in % itself should be good enough.

However, effect size is requested for both continuous and binary outcomes by our clients, and the outcomes are all in one table. So to be consistent, we kind off need to report effect size for binary outcomes.

I am not familiar with risk difference. I can see chi-square test was used here. Can I use chi-square and Cramér's V then for my binary (and categorical) variables?

Thank you!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

02 Jan 2019, 19:17

Chi-square and Cramer's V are not effect sizes. Chi-square is not an effect size because it is also sensitive to the sample size. An effect size must be a pure measure of the difference between the groups, and not depend on sample size.

Cramer's V is like a correlation coefficient between the grouping variable and the dichotomous outcome variable. It is, in a (different) sense, an effect size, but of a different effect. In particular, it is not a measure of the size of the group difference.

The difference between the probabilities in the groups is an effect size: it directly and transparently describes the extent of the difference between the groups in a manner that is scale-free and independent of sample size. That is the very essence of an effect size measure. And it has the advantage of being far more understandable than any other.

I don't think any reasonable client will expect that the same approach to effect sizes will be taken with continuous and categorical variables--that won't make sense. Even in just the descriptive statistics, you normally present mean and s.d. (or median and some other percentile range) for continuous variables but report n's and percents for category variables. So it is, similarly, perfectly reasonable to use Cohen's d for continuous variables and report the probability difference between groups as the effect size for category variables. You can put a footnote in your table explaining that this is what you are doing.
Comment

Maggie Yu

Join Date: Oct 2018
Posts: 20

02 Jan 2019, 21:25

Thanks for your explanation Clyde.

What about the "effect size" after marginal effect (binary outcomes) and logit regression?

For example, we test the adjusted difference between (adjust for age and length of relationship) spouses of servicemen who experienced miscarriage by servicemen groups. The adjusted difference is 4.7 per cent points higher in group 1. How do I calculate effect size for this adjusted difference, or do we just report 4.7 as the adjusted proportion difference? is 4.7 a big difference?

Code:

. svy:logit  sp_miscr  sp_vvgroup sp_age lengthtime_cu , or
(running logit on estimation sample)

Survey: Logistic regression

Number of strata   =         1                  Number of obs     =      1,900
Number of PSUs     =     1,900                  Population size   = 3,817.9277
                                                Design df         =      1,899
                                                F(   3,   1897)   =       1.57
                                                Prob > F          =     0.1950

-------------------------------------------------------------------------------
              |             Linearized
     sp_miscr | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
   sp_vvgroup |   1.255239   .1651192     1.73   0.084     .9698056    1.624682
       sp_age |   .9759781   .0144659    -1.64   0.101     .9480157    1.004765
lengthtime_cu |   1.006095   .0067109     0.91   0.362     .9930188    1.019343
        _cons |   1.366571   1.205483     0.35   0.723      .242264    7.708594
-------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. mfx

Marginal effects after svy:logit
      y  = Pr(sp_miscr) (predict)
         =  .29144931
------------------------------------------------------------------------------
variable |      dy/dx    Std. Err.     z    P>|z|  [    95% C.I.   ]      X
---------+--------------------------------------------------------------------
sp_vvg~p*|   .0470662      .02673    1.76   0.078  -.005322  .099455   .467024
  sp_age |  -.0050212      .00305   -1.65   0.099  -.010994  .000952   63.0716
length~u |   .0012548      .00138    0.91   0.362  -.001446  .003955   37.3211
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

02 Jan 2019, 21:38

How do I calculate effect size for this adjusted difference, or do we just report 4.7 as the adjusted proportion difference?

Same as before, percentages have an implicit and constant scale from 0 to 100 so just report 4.7 percentage points and you have it.

is 4.7 a big difference?

I don't know. It depends on what these variables are. If its a 4.7 percentage point difference in the probability a plane will crash, then yes, it's huge. If it's a 4.7 percentage point difference in the probability that a plane takes off on time, it's probably only a small difference.

You know, the rules of thumb about small, medium, and large effect sizes for Cohen's d are just guidelines. And Cohen's d was developed with variables that are unfamiliar and have no intrinsic units, often psychological measurements that are really just attempts to instrument unobservable states. So the idea that effect sizes could be judged relative to the scale of variation in the response was reasonable, because there was no other standard against which to judge them. But this mechanism is not needed when the variable has a natural and easily understood scale of its own. But that still leaves a degree of judgment involved when you want to describe an effect as small, medium, or large (or some other such adjective).
1 like
Comment
Maggie Yu

Join Date: Oct 2018

Posts: 20
#8

03 Jan 2019, 16:50

Ok, thanks for your help Clyde.
Comment

Announcement

Using Cohen's d for proportion test - after "prtest"

Comment

Comment

Comment

Comment

Comment

Comment

Comment