Effect coding

Fabio Iding

Join Date: Mar 2022

Posts: 5
#1

Effect coding

11 Mar 2022, 12:07

Hello,

I want to do effect coding. So I want to compare the different categories of my variables with the mean of the categories, instead of using a reference categorie. I found the xi3 command, but unfortunately it is not possible anymore to use it. Furthermore I found the desmat command, which has the option of using a simple contrast, but therefore you also have to choose a reference category. Do you know a way of using the desmat command to compare the categories to the mean or do you know another command which is suitable for my problem?

Code:

desmat: logit aktiv Migration health=sim(1) education=sim(1) finance=sim(4)

regards,

Fabio
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

11 Mar 2022, 12:14

Duplicate post.

Also it's unclear what the issue is. I suspect you've used user written commands, which the FAQ asks that you specify, as well as give a minimal worked example of what the issue is, including your data using the dataex command and the exact code you used.
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2371

11 Mar 2022, 14:58

I think effect coding is not much used these days (at least on my area of medical research). There is a package called -igenerate- (SSC) which claims to generate different coding schemes, but I have no experience with it.

As I understand the question, you can try to use (or implement) effect coding, or you can use the default indicator coding and let -margins- or -contrast- take care of the alternative coding scheme. It's easy to make mistakes doing things yourself.

Here is a simple example taking data and guidance from the UCLA consulting group pages, here and here.

Code:

clear *
cls

input  byte(y  grp  e1   e2   e3)
 1   1    1    0    0
 3   1    1    0    0
 2   1    1    0    0
 2   1    1    0    0
 2   2    0    1    0
 3   2    0    1    0
 4   2    0    1    0
 3   2    0    1    0
 5   3    0    0    1
 6   3    0    0    1
 4   3    0    0    1
 5   3    0    0    1
10   4   -1   -1   -1
10   4   -1   -1   -1
 9   4   -1   -1   -1
11   4   -1   -1   -1
end

Results

Code:

. * observed group means, and default regression with indicator coding
. tabstat y, by(grp) s(n mean)

Summary for variables: y
Group variable: grp

     grp |         N      Mean
---------+--------------------
       1 |         4         2
       2 |         4         3
       3 |         4         5
       4 |         4        10
---------+--------------------
   Total |        16         5
------------------------------

. reg y ib4.grp

      Source |       SS           df       MS      Number of obs   =        16
-------------+----------------------------------   F(3, 12)        =     76.00
       Model |         152         3  50.6666667   Prob > F        =    0.0000
    Residual |           8        12  .666666667   R-squared       =    0.9500
-------------+----------------------------------   Adj R-squared   =    0.9375
       Total |         160        15  10.6666667   Root MSE        =     .8165

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         grp |
          1  |         -8   .5773503   -13.86   0.000    -9.257938   -6.742062
          2  |         -7   .5773503   -12.12   0.000    -8.257938   -5.742062
          3  |         -5   .5773503    -8.66   0.000    -6.257938   -3.742062
             |
       _cons |         10   .4082483    24.49   0.000     9.110503     10.8895
------------------------------------------------------------------------------

. * effect (deviation) coding from the grand mean (balanced)
. contrast g.grp , effects

Contrasts of marginal linear predictions

Margins: asbalanced

------------------------------------------------
             |         df           F        P>F
-------------+----------------------------------
         grp |
(1 vs mean)  |          1       72.00     0.0000
(2 vs mean)  |          1       32.00     0.0001
(3 vs mean)  |          1        0.00     1.0000
(4 vs mean)  |          1      200.00     0.0000
      Joint  |          3       76.00     0.0000
             |
 Denominator |         12
------------------------------------------------

------------------------------------------------------------------------------
             |   Contrast   Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         grp |
(1 vs mean)  |         -3   .3535534    -8.49   0.000    -3.770327   -2.229673
(2 vs mean)  |         -2   .3535534    -5.66   0.000    -2.770327   -1.229673
(3 vs mean)  |   6.66e-16   .3535534     0.00   1.000    -.7703267    .7703267
(4 vs mean)  |          5   .3535534    14.14   0.000     4.229673    5.770327
------------------------------------------------------------------------------

. reg y e? // compare with contrast above

      Source |       SS           df       MS      Number of obs   =        16
-------------+----------------------------------   F(3, 12)        =     76.00
       Model |         152         3  50.6666667   Prob > F        =    0.0000
    Residual |           8        12  .666666667   R-squared       =    0.9500
-------------+----------------------------------   Adj R-squared   =    0.9375
       Total |         160        15  10.6666667   Root MSE        =     .8165

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          e1 |         -3   .3535534    -8.49   0.000    -3.770327   -2.229673
          e2 |         -2   .3535534    -5.66   0.000    -2.770327   -1.229673
          e3 |   2.36e-16   .3535534     0.00   1.000    -.7703267    .7703267
       _cons |          5   .2041241    24.49   0.000     4.555252    5.444748
------------------------------------------------------------------------------

. * effect (deviation) coding from the grand mean with unbalanced data
. qui replace y = . if inlist(_n, 5, 8, 9, 11)

.
. * group means and group-weighted grand mean
. tabstat y, by(grp) s(n mean)

Summary for variables: y
Group variable: grp

     grp |         N      Mean
---------+--------------------
       1 |         4         2
       2 |         2       3.5
       3 |         2       5.5
       4 |         4        10
---------+--------------------
   Total |        12       5.5
------------------------------

.
. * unweighted grand mean
. preserve

. collapse y, by(grp)

. tabstat y, s(n mean)

    Variable |         N      Mean
-------------+--------------------
           y |         4      5.25
----------------------------------

. restore

.
. qui reg y i.grp

. contrast g.grp , effects asobserved

Contrasts of marginal linear predictions

Margins: asobserved

------------------------------------------------
             |         df           F        P>F
-------------+----------------------------------
         grp |
(1 vs mean)  |          1       77.26     0.0000
(2 vs mean)  |          1       14.25     0.0054
(3 vs mean)  |          1        0.29     0.6043
(4 vs mean)  |          1      165.03     0.0000
      Joint  |          3       73.60     0.0000
             |
 Denominator |          8
------------------------------------------------

------------------------------------------------------------------------------
             |   Contrast   Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         grp |
(1 vs mean)  |      -3.25    .369755    -8.79   0.000    -4.102657   -2.397343
(2 vs mean)  |      -1.75   .4635124    -3.78   0.005    -2.818862   -.6811385
(3 vs mean)  |        .25   .4635124     0.54   0.604    -.8188615    1.318862
(4 vs mean)  |       4.75    .369755    12.85   0.000     3.897343    5.602657
------------------------------------------------------------------------------

. reg y e? // compare

      Source |       SS           df       MS      Number of obs   =        12
-------------+----------------------------------   F(3, 8)         =     73.60
       Model |         138         3          46   Prob > F        =    0.0000
    Residual |           5         8        .625   R-squared       =    0.9650
-------------+----------------------------------   Adj R-squared   =    0.9519
       Total |         143        11          13   Root MSE        =    .79057

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          e1 |      -3.25    .369755    -8.79   0.000    -4.102657   -2.397343
          e2 |      -1.75   .4635124    -3.78   0.005    -2.818862   -.6811385
          e3 |        .25   .4635124     0.54   0.604    -.8188615    1.318862
       _cons |       5.25   .2420615    21.69   0.000     4.691805    5.808195
------------------------------------------------------------------------------

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#4

11 Mar 2022, 15:01

Okay, just a small question from a public policy student, what even is effect coding? I've never heard of it in my life until now. is there a doctor in the house? 🤣🤣🤣🤣
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#5

11 Mar 2022, 15:10

Originally posted by Jared Greathouse View Post

Okay, just a small question from a public policy student, what even is effect coding? I've never heard of it in my life until now. is there a doctor in the house? 🤣🤣🤣🤣

There is a whole different world of effect coding practices out there, Jared. Yours to discover at the second help link I posted in #3. Effect coding is just a way of coding categorical variables such that their estimates yield a contrast with the grand mean. I presume these coding schemes first arose prior to development and widespread use of software to fit GLMs, when the only alternative for inference would be to code your own contrast matrix.

As an aside, SAS doesn't have a consistent coding method as a default across all of their routines (e.g., logistic regression). I don't think R even implements any of these in base, and Stata wisely (in my humble opinion) pushes them aside but may still be easily accessed using -contrast- and -margins- should there be a need (in this respect, it's quite handy to have such a polished syntax to access these alternatives).
Comment
Fabio Iding

Join Date: Mar 2022

Posts: 5
#6

12 Mar 2022, 08:06

Thanks a lot for your help! I still have to check a few things, but I think I found a solution due to your help! Sorry, if my question was not very easy to understand. I am not an expert in Stata and even less in explaining statistical issues in English.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#7

12 Mar 2022, 08:11

Originally posted by Fabio Iding View Post

Thanks a lot for your help! I still have to check a few things, but I think I found a solution due to your help! Sorry, if my question was not very easy to understand. I am not an expert in Stata and even less in explaining statistical issues in English.

You're welcome. I was able to understand your question easily enough. No need to apologize.
Comment

Fabio Iding

Join Date: Mar 2022
Posts: 5

14 Mar 2022, 06:19

Okay, it seems like I didn't fully understand it

I want to use Average Marginal Effects (AMEs) to interpret my logistic regression. But I want to interpret the different categories of my variables in comparison to the group mean (aka weighted effect coding). Is that even possible? The two options I found are either:

Code:

Code:

 logit aktiv i.MigraStatus i.Gesundheit

Iteration 0:   log likelihood = -5641.1697  
Iteration 1:   log likelihood = -5471.2119  
Iteration 2:   log likelihood = -5470.0607  
Iteration 3:   log likelihood = -5470.0607  

Logistic regression                             Number of obs     =      9,020
                                                LR chi2(5)        =     342.22
                                                Prob > chi2       =     0.0000
Log likelihood = -5470.0607                     Pseudo R2         =     0.0303

----------------------------------------------------------------------------------------
                 aktiv |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
           MigraStatus |
Migrationshintergrund  |   -.472628   .0721672    -6.55   0.000    -.6140732   -.3311828
                       |
            Gesundheit |
        Eher schlecht  |   .3264526   .1289867     2.53   0.011     .0736433    .5792619
               Mittel  |   .7160173   .1169099     6.12   0.000     .4868781    .9451566
             Eher gut  |   1.272875   .1173398    10.85   0.000     1.042893    1.502857
             Sehr gut  |   1.291174   .1232455    10.48   0.000     1.049617    1.532731
                       |
                 _cons |  -.1240206   .1104465    -1.12   0.261    -.3404917    .0924505
----------------------------------------------------------------------------------------

.         // Prob>chi2: 0,0
.         //estat gof
.         //   Prob > chi2 =         0.6458
.         
.                 contrast gw.MigraStatus gw.Gesundheit, effects

Contrasts of marginal linear predictions

Margins      : asbalanced

-------------------------------------------------------------------------
                                      |         df        chi2     P>chi2
--------------------------------------+----------------------------------
                          MigraStatus |
(Kein Migrationshintergrund vs mean)  |          1       42.89     0.0000
     (Migrationshintergrund vs mean)  |          1       42.89     0.0000
                               Joint  |          1       42.89     0.0000
                                      |
                           Gesundheit |
             (Sehr schlecht vs mean)  |          1       78.88     0.0000
             (Eher schlecht vs mean)  |          1       97.36     0.0000
                    (Mittel vs mean)  |          1       53.96     0.0000
                  (Eher gut vs mean)  |          1       95.82     0.0000
                  (Sehr gut vs mean)  |          1       44.54     0.0000
                               Joint  |          4      294.90     0.0000
-------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------
                                      |   Contrast   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------------------------+----------------------------------------------------------------
                          MigraStatus |
(Kein Migrationshintergrund vs mean)  |   .0487299   .0074407     6.55   0.000     .0341463    .0633135
     (Migrationshintergrund vs mean)  |  -.4238981   .0647265    -6.55   0.000    -.5507596   -.2970365
                                      |
                           Gesundheit |
             (Sehr schlecht vs mean)  |  -.9630082   .1084294    -8.88   0.000    -1.175526   -.7504905
             (Eher schlecht vs mean)  |  -.6365557   .0645132    -9.87   0.000    -.7629992   -.5101121
                    (Mittel vs mean)  |  -.2469909   .0336237    -7.35   0.000    -.3128921   -.1810897
                  (Eher gut vs mean)  |   .3098666   .0316555     9.79   0.000      .247823    .3719102
                  (Sehr gut vs mean)  |   .3281657    .049174     6.67   0.000     .2317864    .4245449
-------------------------------------------------------------------------------------------------------

.         
.                 margins, dydx(*)

Average marginal effects                        Number of obs     =      9,020
Model VCE    : OIM

Expression   : Pr(aktiv), predict()
dy/dx w.r.t. : 1.MigraStatus 2.Gesundheit 3.Gesundheit 4.Gesundheit 5.Gesundheit

----------------------------------------------------------------------------------------
                       |            Delta-method
                       |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
           MigraStatus |
Migrationshintergrund  |  -.1044586   .0166347    -6.28   0.000    -.1370621   -.0718552
                       |
            Gesundheit |
        Eher schlecht  |   .0810247   .0318676     2.54   0.011     .0185654    .1434839
               Mittel  |   .1747791   .0286767     6.09   0.000     .1185738    .2309843
             Eher gut  |   .2920732   .0282354    10.34   0.000     .2367329    .3474136
             Sehr gut  |   .2954789   .0290638    10.17   0.000     .2385149    .3524428
----------------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Then it does effect coding in the step before the AMEs, but when it comes to the AMEs it uses dummy-coding again. So I cannot compare the AMEs of the categories with the group mean.

The other option:

Code:

Code:

.                 logit aktiv i.MigraStatus i.Gesundheit

Iteration 0:   log likelihood = -5641.1697  
Iteration 1:   log likelihood = -5471.2119  
Iteration 2:   log likelihood = -5470.0607  
Iteration 3:   log likelihood = -5470.0607  

Logistic regression                             Number of obs     =      9,020
                                                LR chi2(5)        =     342.22
                                                Prob > chi2       =     0.0000
Log likelihood = -5470.0607                     Pseudo R2         =     0.0303

----------------------------------------------------------------------------------------
                 aktiv |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
           MigraStatus |
Migrationshintergrund  |   -.472628   .0721672    -6.55   0.000    -.6140732   -.3311828
                       |
            Gesundheit |
        Eher schlecht  |   .3264526   .1289867     2.53   0.011     .0736433    .5792619
               Mittel  |   .7160173   .1169099     6.12   0.000     .4868781    .9451566
             Eher gut  |   1.272875   .1173398    10.85   0.000     1.042893    1.502857
             Sehr gut  |   1.291174   .1232455    10.48   0.000     1.049617    1.532731
                       |
                 _cons |  -.1240206   .1104465    -1.12   0.261    -.3404917    .0924505
----------------------------------------------------------------------------------------

. 
end of do-file

. do "C:\Users\faid0\AppData\Local\Temp\STD2b30_000000.tmp"

.         margins gw.MigraStatus gw.Gesundheit

Contrasts of predictive margins                 Number of obs     =      9,020
Model VCE    : OIM

Expression   : Pr(aktiv), predict()

-------------------------------------------------------------------------
                                      |         df        chi2     P>chi2
--------------------------------------+----------------------------------
                          MigraStatus |
(Kein Migrationshintergrund vs mean)  |          1       39.43     0.0000
     (Migrationshintergrund vs mean)  |          1       39.43     0.0000
                               Joint  |          1       39.43     0.0000
                                      |
                           Gesundheit |
             (Sehr schlecht vs mean)  |          1       71.33     0.0000
             (Eher schlecht vs mean)  |          1       84.35     0.0000
                    (Mittel vs mean)  |          1       45.44     0.0000
                  (Eher gut vs mean)  |          1      115.55     0.0000
                  (Sehr gut vs mean)  |          1       57.26     0.0000
                               Joint  |          4      295.29     0.0000
-------------------------------------------------------------------------

---------------------------------------------------------------------------------------
                                      |            Delta-method
                                      |   Contrast   Std. Err.     [95% Conf. Interval]
--------------------------------------+------------------------------------------------
                          MigraStatus |
(Kein Migrationshintergrund vs mean)  |   .0107701   .0017151      .0074086    .0141317
     (Migrationshintergrund vs mean)  |  -.0936885   .0149196     -.1229304   -.0644467
                                      |
                           Gesundheit |
             (Sehr schlecht vs mean)  |  -.2246663   .0266011     -.2768035   -.1725292
             (Eher schlecht vs mean)  |  -.1436417   .0156398      -.174295   -.1129883
                    (Mittel vs mean)  |  -.0498873   .0074003     -.0643915    -.035383
                  (Eher gut vs mean)  |   .0674069   .0062707      .0551165    .0796973
                  (Sehr gut vs mean)  |   .0708126   .0093583      .0524705    .0891546
---------------------------------------------------------------------------------------

Then the different categories are compared with the group mean but those are only predictive margins not AMEs. Unfortunately the operator gw. is not allowed with margins, dydx() , which could have been the solution. Do you know by any chance another option. Or is it simply not possible to do Effect Coding, when you look at AMEs. If it is possible, how do I work with binary Variables (e.g. Migration Status yes/no), discret variables (e.g. age) or categorial Variables which to not have a logical order (e.g. Country)? Does it even make sense to use effect coding on them?

Regards and really thanks a lot for your help!

Fabio

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#9

14 Mar 2022, 09:24

Originally posted by Fabio Iding View Post

I want to use Average Marginal Effects (AMEs) to interpret my logistic regression. But I want to interpret the different categories of my variables in comparison to the group mean (aka weighted effect coding). Is that even possible? The two options I found are either:

To my understanding these are two different goals and effect coding is making things more complicated than necessary.

Let's consider the silly example of a logistic regression model to predict the proportion of foreign cars by each quartile of MSRP price. NB: I am using the default (and recommended) reference coding factor-variable notation.

Code:

sysuse auto, clear xtile price_4q = price , nq(4) logit foreign i.price_4q

If you want to examine the contrast of categories to the overall mean, that can be done with -margins- (or contrast). Specifically, this is computing the overall predictive mean probability, as well as the group-specific mean probability for each price quartile, and contrasting each group-specific mean to the overall mean.

Code:

margins gw.price_4q , contrast(effect) // compare to overall and group-specific means margins margins i.price_4q

However, average marginal effects (AMEs) in this case are really just the difference between two factor levels.

Code:

margins rb1.price_4q, contrast(effect) // change the '1' in 'rb1' to whatever reference level you want. margins , dydx(price_4q) // more simple if you accept the base level indicated in the original regression

Does it even make sense to use effect coding on them?

No, probably not. As this thread shows, it's very easy to become confused about what it is that is being coded for using effect coding. Any of these coding schemes only apply to variables you wish to treat as categorical (so continuous variables such as age are irrelevant to this discussion). It is much simpler to use the reference-level coding implied by Stata's factor notation (-help factor variable-) and then working with margins or contrast to examine specific quantities of interest later.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1109
#10

14 Mar 2022, 09:56

The context is SPSS, but this UCLA page has a nice discussion of different coding schemes.
https://stats.oarc.ucla.edu/spss/faq...sion-analysis/

Notice the distinction between "regression" codes and "contrast" codes. See the paragraph just before the second table.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
1 like
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 868
#11

14 Mar 2022, 10:42

Originally posted by Leonardo Guizzetti View Post

"Does it even make sense to use effect coding on them?"

No, probably not. As this thread shows, it's very easy to become confused about what it is that is being coded for using effect coding.

The use of effect coding scheme is really dependent on study designs. In many traditional study designs, effect coding probably does not offer anything extra. However, experimental studies with fractional factorial design are dependent on effect coding schemes which helps to test specific multiple hypotheses with reduced number of arms. The study arms need to be orthogonal in design to test main and interaction effect. See this paper from Linda Collins on factorial design and effect coding and analyses with effect coding vs. dummy coding .

Roman
1 like
Comment
Fabio Iding

Join Date: Mar 2022

Posts: 5
#12

14 Mar 2022, 11:11

Thanks again for your help! So I think I am just going to go back to dummy-coding and I'm going to pick the reference category which makes the most sense
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment