Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Effect coding

    Hello,

    I want to do effect coding. So I want to compare the different categories of my variables with the mean of the categories, instead of using a reference categorie. I found the xi3 command, but unfortunately it is not possible anymore to use it. Furthermore I found the desmat command, which has the option of using a simple contrast, but therefore you also have to choose a reference category. Do you know a way of using the desmat command to compare the categories to the mean or do you know another command which is suitable for my problem?

    Code:

    desmat: logit aktiv Migration health=sim(1) education=sim(1) finance=sim(4)

    regards,

    Fabio


  • #2
    Duplicate post.


    Also it's unclear what the issue is. I suspect you've used user written commands, which the FAQ asks that you specify, as well as give a minimal worked example of what the issue is, including your data using the dataex command and the exact code you used.

    Comment


    • #3
      I think effect coding is not much used these days (at least on my area of medical research). There is a package called -igenerate- (SSC) which claims to generate different coding schemes, but I have no experience with it.

      As I understand the question, you can try to use (or implement) effect coding, or you can use the default indicator coding and let -margins- or -contrast- take care of the alternative coding scheme. It's easy to make mistakes doing things yourself.

      Here is a simple example taking data and guidance from the UCLA consulting group pages, here and here.

      Code:
      clear *
      cls
      
      input  byte(y  grp  e1   e2   e3)
       1   1    1    0    0
       3   1    1    0    0
       2   1    1    0    0
       2   1    1    0    0
       2   2    0    1    0
       3   2    0    1    0
       4   2    0    1    0
       3   2    0    1    0
       5   3    0    0    1
       6   3    0    0    1
       4   3    0    0    1
       5   3    0    0    1
      10   4   -1   -1   -1
      10   4   -1   -1   -1
       9   4   -1   -1   -1
      11   4   -1   -1   -1
      end
      Results

      Code:
      . * observed group means, and default regression with indicator coding
      . tabstat y, by(grp) s(n mean)
      
      Summary for variables: y
      Group variable: grp
      
           grp |         N      Mean
      ---------+--------------------
             1 |         4         2
             2 |         4         3
             3 |         4         5
             4 |         4        10
      ---------+--------------------
         Total |        16         5
      ------------------------------
      
      . reg y ib4.grp
      
            Source |       SS           df       MS      Number of obs   =        16
      -------------+----------------------------------   F(3, 12)        =     76.00
             Model |         152         3  50.6666667   Prob > F        =    0.0000
          Residual |           8        12  .666666667   R-squared       =    0.9500
      -------------+----------------------------------   Adj R-squared   =    0.9375
             Total |         160        15  10.6666667   Root MSE        =     .8165
      
      ------------------------------------------------------------------------------
                 y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               grp |
                1  |         -8   .5773503   -13.86   0.000    -9.257938   -6.742062
                2  |         -7   .5773503   -12.12   0.000    -8.257938   -5.742062
                3  |         -5   .5773503    -8.66   0.000    -6.257938   -3.742062
                   |
             _cons |         10   .4082483    24.49   0.000     9.110503     10.8895
      ------------------------------------------------------------------------------
      
      . * effect (deviation) coding from the grand mean (balanced)
      . contrast g.grp , effects
      
      Contrasts of marginal linear predictions
      
      Margins: asbalanced
      
      ------------------------------------------------
                   |         df           F        P>F
      -------------+----------------------------------
               grp |
      (1 vs mean)  |          1       72.00     0.0000
      (2 vs mean)  |          1       32.00     0.0001
      (3 vs mean)  |          1        0.00     1.0000
      (4 vs mean)  |          1      200.00     0.0000
            Joint  |          3       76.00     0.0000
                   |
       Denominator |         12
      ------------------------------------------------
      
      ------------------------------------------------------------------------------
                   |   Contrast   Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               grp |
      (1 vs mean)  |         -3   .3535534    -8.49   0.000    -3.770327   -2.229673
      (2 vs mean)  |         -2   .3535534    -5.66   0.000    -2.770327   -1.229673
      (3 vs mean)  |   6.66e-16   .3535534     0.00   1.000    -.7703267    .7703267
      (4 vs mean)  |          5   .3535534    14.14   0.000     4.229673    5.770327
      ------------------------------------------------------------------------------
      
      . reg y e? // compare with contrast above
      
            Source |       SS           df       MS      Number of obs   =        16
      -------------+----------------------------------   F(3, 12)        =     76.00
             Model |         152         3  50.6666667   Prob > F        =    0.0000
          Residual |           8        12  .666666667   R-squared       =    0.9500
      -------------+----------------------------------   Adj R-squared   =    0.9375
             Total |         160        15  10.6666667   Root MSE        =     .8165
      
      ------------------------------------------------------------------------------
                 y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
                e1 |         -3   .3535534    -8.49   0.000    -3.770327   -2.229673
                e2 |         -2   .3535534    -5.66   0.000    -2.770327   -1.229673
                e3 |   2.36e-16   .3535534     0.00   1.000    -.7703267    .7703267
             _cons |          5   .2041241    24.49   0.000     4.555252    5.444748
      ------------------------------------------------------------------------------
      
      . * effect (deviation) coding from the grand mean with unbalanced data
      . qui replace y = . if inlist(_n, 5, 8, 9, 11)
      
      .
      . * group means and group-weighted grand mean
      . tabstat y, by(grp) s(n mean)
      
      Summary for variables: y
      Group variable: grp
      
           grp |         N      Mean
      ---------+--------------------
             1 |         4         2
             2 |         2       3.5
             3 |         2       5.5
             4 |         4        10
      ---------+--------------------
         Total |        12       5.5
      ------------------------------
      
      .
      . * unweighted grand mean
      . preserve
      
      . collapse y, by(grp)
      
      . tabstat y, s(n mean)
      
          Variable |         N      Mean
      -------------+--------------------
                 y |         4      5.25
      ----------------------------------
      
      . restore
      
      .
      . qui reg y i.grp
      
      . contrast g.grp , effects asobserved
      
      Contrasts of marginal linear predictions
      
      Margins: asobserved
      
      ------------------------------------------------
                   |         df           F        P>F
      -------------+----------------------------------
               grp |
      (1 vs mean)  |          1       77.26     0.0000
      (2 vs mean)  |          1       14.25     0.0054
      (3 vs mean)  |          1        0.29     0.6043
      (4 vs mean)  |          1      165.03     0.0000
            Joint  |          3       73.60     0.0000
                   |
       Denominator |          8
      ------------------------------------------------
      
      ------------------------------------------------------------------------------
                   |   Contrast   Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               grp |
      (1 vs mean)  |      -3.25    .369755    -8.79   0.000    -4.102657   -2.397343
      (2 vs mean)  |      -1.75   .4635124    -3.78   0.005    -2.818862   -.6811385
      (3 vs mean)  |        .25   .4635124     0.54   0.604    -.8188615    1.318862
      (4 vs mean)  |       4.75    .369755    12.85   0.000     3.897343    5.602657
      ------------------------------------------------------------------------------
      
      . reg y e? // compare
      
            Source |       SS           df       MS      Number of obs   =        12
      -------------+----------------------------------   F(3, 8)         =     73.60
             Model |         138         3          46   Prob > F        =    0.0000
          Residual |           5         8        .625   R-squared       =    0.9650
      -------------+----------------------------------   Adj R-squared   =    0.9519
             Total |         143        11          13   Root MSE        =    .79057
      
      ------------------------------------------------------------------------------
                 y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
                e1 |      -3.25    .369755    -8.79   0.000    -4.102657   -2.397343
                e2 |      -1.75   .4635124    -3.78   0.005    -2.818862   -.6811385
                e3 |        .25   .4635124     0.54   0.604    -.8188615    1.318862
             _cons |       5.25   .2420615    21.69   0.000     4.691805    5.808195
      ------------------------------------------------------------------------------

      Comment


      • #4
        Okay, just a small question from a public policy student, what even is effect coding? I've never heard of it in my life until now. is there a doctor in the house? 🤣🤣🤣🤣

        Comment


        • #5
          Originally posted by Jared Greathouse View Post
          Okay, just a small question from a public policy student, what even is effect coding? I've never heard of it in my life until now. is there a doctor in the house? 🤣🤣🤣🤣
          There is a whole different world of effect coding practices out there, Jared. Yours to discover at the second help link I posted in #3. Effect coding is just a way of coding categorical variables such that their estimates yield a contrast with the grand mean. I presume these coding schemes first arose prior to development and widespread use of software to fit GLMs, when the only alternative for inference would be to code your own contrast matrix.

          As an aside, SAS doesn't have a consistent coding method as a default across all of their routines (e.g., logistic regression). I don't think R even implements any of these in base, and Stata wisely (in my humble opinion) pushes them aside but may still be easily accessed using -contrast- and -margins- should there be a need (in this respect, it's quite handy to have such a polished syntax to access these alternatives).

          Comment


          • #6
            Thanks a lot for your help! I still have to check a few things, but I think I found a solution due to your help! Sorry, if my question was not very easy to understand. I am not an expert in Stata and even less in explaining statistical issues in English.

            Comment


            • #7
              Originally posted by Fabio Iding View Post
              Thanks a lot for your help! I still have to check a few things, but I think I found a solution due to your help! Sorry, if my question was not very easy to understand. I am not an expert in Stata and even less in explaining statistical issues in English.
              You're welcome. I was able to understand your question easily enough. No need to apologize.

              Comment


              • #8
                Okay, it seems like I didn't fully understand it

                I want to use Average Marginal Effects (AMEs) to interpret my logistic regression. But I want to interpret the different categories of my variables in comparison to the group mean (aka weighted effect coding). Is that even possible? The two options I found are either:


                Code:

                Code:
                 logit aktiv i.MigraStatus i.Gesundheit
                
                Iteration 0:   log likelihood = -5641.1697  
                Iteration 1:   log likelihood = -5471.2119  
                Iteration 2:   log likelihood = -5470.0607  
                Iteration 3:   log likelihood = -5470.0607  
                
                Logistic regression                             Number of obs     =      9,020
                                                                LR chi2(5)        =     342.22
                                                                Prob > chi2       =     0.0000
                Log likelihood = -5470.0607                     Pseudo R2         =     0.0303
                
                ----------------------------------------------------------------------------------------
                                 aktiv |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -----------------------+----------------------------------------------------------------
                           MigraStatus |
                Migrationshintergrund  |   -.472628   .0721672    -6.55   0.000    -.6140732   -.3311828
                                       |
                            Gesundheit |
                        Eher schlecht  |   .3264526   .1289867     2.53   0.011     .0736433    .5792619
                               Mittel  |   .7160173   .1169099     6.12   0.000     .4868781    .9451566
                             Eher gut  |   1.272875   .1173398    10.85   0.000     1.042893    1.502857
                             Sehr gut  |   1.291174   .1232455    10.48   0.000     1.049617    1.532731
                                       |
                                 _cons |  -.1240206   .1104465    -1.12   0.261    -.3404917    .0924505
                ----------------------------------------------------------------------------------------
                
                .         // Prob>chi2: 0,0
                .         //estat gof
                .         //   Prob > chi2 =         0.6458
                .         
                .                 contrast gw.MigraStatus gw.Gesundheit, effects
                
                Contrasts of marginal linear predictions
                
                Margins      : asbalanced
                
                -------------------------------------------------------------------------
                                                      |         df        chi2     P>chi2
                --------------------------------------+----------------------------------
                                          MigraStatus |
                (Kein Migrationshintergrund vs mean)  |          1       42.89     0.0000
                     (Migrationshintergrund vs mean)  |          1       42.89     0.0000
                                               Joint  |          1       42.89     0.0000
                                                      |
                                           Gesundheit |
                             (Sehr schlecht vs mean)  |          1       78.88     0.0000
                             (Eher schlecht vs mean)  |          1       97.36     0.0000
                                    (Mittel vs mean)  |          1       53.96     0.0000
                                  (Eher gut vs mean)  |          1       95.82     0.0000
                                  (Sehr gut vs mean)  |          1       44.54     0.0000
                                               Joint  |          4      294.90     0.0000
                -------------------------------------------------------------------------
                
                -------------------------------------------------------------------------------------------------------
                                                      |   Contrast   Std. Err.      z    P>|z|     [95% Conf. Interval]
                --------------------------------------+----------------------------------------------------------------
                                          MigraStatus |
                (Kein Migrationshintergrund vs mean)  |   .0487299   .0074407     6.55   0.000     .0341463    .0633135
                     (Migrationshintergrund vs mean)  |  -.4238981   .0647265    -6.55   0.000    -.5507596   -.2970365
                                                      |
                                           Gesundheit |
                             (Sehr schlecht vs mean)  |  -.9630082   .1084294    -8.88   0.000    -1.175526   -.7504905
                             (Eher schlecht vs mean)  |  -.6365557   .0645132    -9.87   0.000    -.7629992   -.5101121
                                    (Mittel vs mean)  |  -.2469909   .0336237    -7.35   0.000    -.3128921   -.1810897
                                  (Eher gut vs mean)  |   .3098666   .0316555     9.79   0.000      .247823    .3719102
                                  (Sehr gut vs mean)  |   .3281657    .049174     6.67   0.000     .2317864    .4245449
                -------------------------------------------------------------------------------------------------------
                
                .         
                .                 margins, dydx(*)
                
                Average marginal effects                        Number of obs     =      9,020
                Model VCE    : OIM
                
                Expression   : Pr(aktiv), predict()
                dy/dx w.r.t. : 1.MigraStatus 2.Gesundheit 3.Gesundheit 4.Gesundheit 5.Gesundheit
                
                ----------------------------------------------------------------------------------------
                                       |            Delta-method
                                       |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -----------------------+----------------------------------------------------------------
                           MigraStatus |
                Migrationshintergrund  |  -.1044586   .0166347    -6.28   0.000    -.1370621   -.0718552
                                       |
                            Gesundheit |
                        Eher schlecht  |   .0810247   .0318676     2.54   0.011     .0185654    .1434839
                               Mittel  |   .1747791   .0286767     6.09   0.000     .1185738    .2309843
                             Eher gut  |   .2920732   .0282354    10.34   0.000     .2367329    .3474136
                             Sehr gut  |   .2954789   .0290638    10.17   0.000     .2385149    .3524428
                ----------------------------------------------------------------------------------------
                Note: dy/dx for factor levels is the discrete change from the base level.
                Then it does effect coding in the step before the AMEs, but when it comes to the AMEs it uses dummy-coding again. So I cannot compare the AMEs of the categories with the group mean.


                The other option:

                Code:

                Code:
                .                 logit aktiv i.MigraStatus i.Gesundheit
                
                Iteration 0:   log likelihood = -5641.1697  
                Iteration 1:   log likelihood = -5471.2119  
                Iteration 2:   log likelihood = -5470.0607  
                Iteration 3:   log likelihood = -5470.0607  
                
                Logistic regression                             Number of obs     =      9,020
                                                                LR chi2(5)        =     342.22
                                                                Prob > chi2       =     0.0000
                Log likelihood = -5470.0607                     Pseudo R2         =     0.0303
                
                ----------------------------------------------------------------------------------------
                                 aktiv |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -----------------------+----------------------------------------------------------------
                           MigraStatus |
                Migrationshintergrund  |   -.472628   .0721672    -6.55   0.000    -.6140732   -.3311828
                                       |
                            Gesundheit |
                        Eher schlecht  |   .3264526   .1289867     2.53   0.011     .0736433    .5792619
                               Mittel  |   .7160173   .1169099     6.12   0.000     .4868781    .9451566
                             Eher gut  |   1.272875   .1173398    10.85   0.000     1.042893    1.502857
                             Sehr gut  |   1.291174   .1232455    10.48   0.000     1.049617    1.532731
                                       |
                                 _cons |  -.1240206   .1104465    -1.12   0.261    -.3404917    .0924505
                ----------------------------------------------------------------------------------------
                
                . 
                end of do-file
                
                . do "C:\Users\faid0\AppData\Local\Temp\STD2b30_000000.tmp"
                
                .         margins gw.MigraStatus gw.Gesundheit
                
                Contrasts of predictive margins                 Number of obs     =      9,020
                Model VCE    : OIM
                
                Expression   : Pr(aktiv), predict()
                
                -------------------------------------------------------------------------
                                                      |         df        chi2     P>chi2
                --------------------------------------+----------------------------------
                                          MigraStatus |
                (Kein Migrationshintergrund vs mean)  |          1       39.43     0.0000
                     (Migrationshintergrund vs mean)  |          1       39.43     0.0000
                                               Joint  |          1       39.43     0.0000
                                                      |
                                           Gesundheit |
                             (Sehr schlecht vs mean)  |          1       71.33     0.0000
                             (Eher schlecht vs mean)  |          1       84.35     0.0000
                                    (Mittel vs mean)  |          1       45.44     0.0000
                                  (Eher gut vs mean)  |          1      115.55     0.0000
                                  (Sehr gut vs mean)  |          1       57.26     0.0000
                                               Joint  |          4      295.29     0.0000
                -------------------------------------------------------------------------
                
                ---------------------------------------------------------------------------------------
                                                      |            Delta-method
                                                      |   Contrast   Std. Err.     [95% Conf. Interval]
                --------------------------------------+------------------------------------------------
                                          MigraStatus |
                (Kein Migrationshintergrund vs mean)  |   .0107701   .0017151      .0074086    .0141317
                     (Migrationshintergrund vs mean)  |  -.0936885   .0149196     -.1229304   -.0644467
                                                      |
                                           Gesundheit |
                             (Sehr schlecht vs mean)  |  -.2246663   .0266011     -.2768035   -.1725292
                             (Eher schlecht vs mean)  |  -.1436417   .0156398      -.174295   -.1129883
                                    (Mittel vs mean)  |  -.0498873   .0074003     -.0643915    -.035383
                                  (Eher gut vs mean)  |   .0674069   .0062707      .0551165    .0796973
                                  (Sehr gut vs mean)  |   .0708126   .0093583      .0524705    .0891546
                ---------------------------------------------------------------------------------------
                Then the different categories are compared with the group mean but those are only predictive margins not AMEs. Unfortunately the operator gw. is not allowed with margins, dydx() , which could have been the solution. Do you know by any chance another option. Or is it simply not possible to do Effect Coding, when you look at AMEs. If it is possible, how do I work with binary Variables (e.g. Migration Status yes/no), discret variables (e.g. age) or categorial Variables which to not have a logical order (e.g. Country)? Does it even make sense to use effect coding on them?

                Regards and really thanks a lot for your help!

                Fabio




                Comment


                • #9
                  Originally posted by Fabio Iding View Post
                  I want to use Average Marginal Effects (AMEs) to interpret my logistic regression. But I want to interpret the different categories of my variables in comparison to the group mean (aka weighted effect coding). Is that even possible? The two options I found are either:
                  To my understanding these are two different goals and effect coding is making things more complicated than necessary.

                  Let's consider the silly example of a logistic regression model to predict the proportion of foreign cars by each quartile of MSRP price. NB: I am using the default (and recommended) reference coding factor-variable notation.

                  Code:
                  sysuse auto, clear
                  xtile price_4q = price , nq(4)
                  logit foreign i.price_4q
                  If you want to examine the contrast of categories to the overall mean, that can be done with -margins- (or contrast). Specifically, this is computing the overall predictive mean probability, as well as the group-specific mean probability for each price quartile, and contrasting each group-specific mean to the overall mean.

                  Code:
                  margins gw.price_4q , contrast(effect)
                  // compare to overall and group-specific means
                  margins
                  margins i.price_4q
                  However, average marginal effects (AMEs) in this case are really just the difference between two factor levels.

                  Code:
                  margins rb1.price_4q, contrast(effect)  // change the '1' in 'rb1' to whatever reference level you want.
                  margins , dydx(price_4q) // more simple if you accept the base level indicated in the original regression
                  Does it even make sense to use effect coding on them?
                  No, probably not. As this thread shows, it's very easy to become confused about what it is that is being coded for using effect coding. Any of these coding schemes only apply to variables you wish to treat as categorical (so continuous variables such as age are irrelevant to this discussion). It is much simpler to use the reference-level coding implied by Stata's factor notation (-help factor variable-) and then working with margins or contrast to examine specific quantities of interest later.

                  Comment


                  • #10
                    The context is SPSS, but this UCLA page has a nice discussion of different coding schemes. Notice the distinction between "regression" codes and "contrast" codes. See the paragraph just before the second table.
                    --
                    Bruce Weaver
                    Email: [email protected]
                    Version: Stata/MP 18.5 (Windows)

                    Comment


                    • #11
                      Originally posted by Leonardo Guizzetti View Post

                      "Does it even make sense to use effect coding on them?"

                      No, probably not. As this thread shows, it's very easy to become confused about what it is that is being coded for using effect coding.
                      The use of effect coding scheme is really dependent on study designs. In many traditional study designs, effect coding probably does not offer anything extra. However, experimental studies with fractional factorial design are dependent on effect coding schemes which helps to test specific multiple hypotheses with reduced number of arms. The study arms need to be orthogonal in design to test main and interaction effect. See this paper from Linda Collins on factorial design and effect coding and analyses with effect coding vs. dummy coding .
                      Roman

                      Comment


                      • #12
                        Thanks again for your help! So I think I am just going to go back to dummy-coding and I'm going to pick the reference category which makes the most sense

                        Comment

                        Working...
                        X