Calculating marginal effects manually (without -margins-)

Mallory Montgomery

Join Date: Jun 2015

Posts: 8
#1

Calculating marginal effects manually (without -margins-)

28 Sep 2017, 12:41

I have some very large data files (~60gb) on which I am running several Poisson regressions and calculating marginal effects for a dummy variable, which is interacted with many other variables.

With a small dataset, I would run the following (excuse the contrived example):

Code:

sysuse auto, clear drop if rep78<3 poisson price i.foreign##(c.weight i.rep78 c.length) margins foreign, predict(xb) gen(marg)

(output omitted)

However, I have found that with my giant data set, -margins- takes much longer than manually calculating marginal effects. In this case, I can calculate them manually by:

Code:

gen marg_dom = _b[_cons] + _b[1.foreign]*0 + _b[weight]*weight + _b[4.rep78]*4.rep78 /// + _b[5.rep78]*5.rep78 + _b[length]*length + _b[1.foreign#c.weight]*0*weight /// + _b[1.foreign#4.rep78]*0*4.rep78 + _b[1.foreign#5.rep78]*0*5.rep78 /// + _b[1.foreign#c.length]*0*length gen marg_for = _b[_cons] + _b[1.foreign]*1 + _b[weight]*weight + _b[4.rep78]*4.rep78 /// + _b[5.rep78]*5.rep78 + _b[length]*length + _b[1.foreign#c.weight]*1*weight /// + _b[1.foreign#4.rep78]*1*4.rep78 + _b[1.foreign#5.rep78]*1*5.rep78 /// + _b[1.foreign#c.length]*1*length

This goes much faster. Using 1/30th of my sample to test, it took about 40 seconds to calculate marginal effects using -margins- and 1 second to calculate it manually. With the whole dataset, it took hours to estimate one regression and run -margins- afterward. And of course, both methods yield identical results:

Code:

. sum marg* Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- marg1 | 59 8.506748 .446367 7.747314 9.524893 marg2 | 59 9.237163 .6608012 8.193282 10.61953 marg_dom | 59 8.506748 .446367 7.747314 9.524893 marg_for | 59 9.237163 .6608012 8.193282 10.61953

My problem is that my actual specifications have many more variables, including some factor variables with over 100 entries. And, I have several specifications which differ in which variables are included. As a result, manually typing out the marginal effects calculations is simultaneously tedious, time consuming, and error prone. I need only the marginal effects, not the standard errors or other reported values from the output of -margins- (with this many observations, and with a theory-driven model that doesn't include unrelated variables, p-values are always ~0.000).

My question is, how can I calculate marginal effects such as these without manually typing out each one? Is there a way to do this using -margins- that can go faster?
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 4852
#2

28 Sep 2017, 12:46

Try adding the nose (no standard errors) option to margins and see if that speeds things up enough. I think SEs can add a lot of time to the calculations.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
Stata Version: 17.0 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 105
#3

28 Sep 2017, 13:07

If you are not interested in standard errors, check out the nose option of margins. Using this option suppresses the standard error calculations and is therefore a lot faster.
Comment
Mallory Montgomery

Join Date: Jun 2015

Posts: 8
#4

28 Sep 2017, 13:11

Originally posted by Richard Williams View Post

Try adding the nose (no standard errors) option to margins and see if that speeds things up enough. I think SEs can add a lot of time to the calculations.

Originally posted by Joerg Luedicke (StataCorp) View Post

If you are not interested in standard errors, check out the nose option of margins. Using this option suppresses the standard error calculations and is therefore a lot faster.

Strangely, this took even longer and then yielded an error, "margins generate variable for 0.my_var invalid name" (and worked fine immediately afterward without the nose option). When I ran it without -gen(marg)- it worked but was only somewhat faster than without the nose option, and still took many times longer than the manual calculation.

Last edited by Mallory Montgomery; 28 Sep 2017, 13:14.
Comment
Mallory Montgomery

Join Date: Jun 2015

Posts: 8
#5

28 Sep 2017, 17:08

I devised a rather inelegant solution. Essentially, I use -capture- to add every possible coefficient*value. For testing, I included -noisily- so that I could make sure skipped terms were because they were absent from the estimation and not because of a typo. I also checked the results of -margins- and my manual calculations side-by-side for many specifications on a small subsample, to make sure I wasn't missing anything.

Still, it's is far from foolproof, it's adding hundreds of extra lines to my code, and it's creating ugly output, so I do hope someone more clever comes up with something better.

Example of my code:

Code:

gen marg_1 = _b[_cons] + _b[1.my_var]*1 gen marg_0 = _b[_cons] + _b[1.my_var]*0 // continuous variables local vlist c_var1 c_var2 c_var3 c_var4 foreach v of local vlist { cap noisily replace marg_1 = marg_1 + _b[`v']*`v' + _b[1.my_var#`v']*1*`v' cap noisily replace marg_0 = marg_0 + _b[`v']*`v' + _b[1.my_var#`v']*0*`v' } // example factor variable foreach w of numlist 1/52 { cap noisily replace marg_1 = marg_1 + _b[`w'.week]*`d'.week cap noisily replace marg_0 = marg_0 + _b[`w'.week]*`d'.week cap noisily replace marg_1 = marg_1 + _b[1.my_var#`w'.week]*1*`d'.week cap noisily replace marg_0 = marg_0 + _b[1.my_var#`w'.week]*0*`d'.week }
Comment

Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014
Posts: 105

28 Sep 2017, 17:32

A relatively foolproof approach is to simply replace the values of interest in the dataset followed by using predict. For example, say we have a model like this:

Code:

. webuse nlsw88
(NLSW, 1988 extract)

. poisson wage i.never_married##i.south##c.grade collgrad union hours 
note: you are responsible for interpretation of noncount dep. variable

Iteration 0:   log likelihood = -4930.2271  
Iteration 1:   log likelihood = -4930.2204  
Iteration 2:   log likelihood = -4930.2204  

Poisson regression                              Number of obs     =      1,875
                                                LR chi2(10)       =     971.27
                                                Prob > chi2       =     0.0000
Log likelihood = -4930.2204                     Pseudo R2         =     0.0897

---------------------------------------------------------------------------------------------
                       wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
            1.never_married |  -.0200971   .1810634    -0.11   0.912    -.3749749    .3347806
                    1.south |  -.2601364   .1003275    -2.59   0.010    -.4567747   -.0634981
                            |
        never_married#south |
                       1 1  |  -.2071025   .2781712    -0.74   0.457     -.752308     .338103
                            |
                      grade |   .0850188   .0074318    11.44   0.000     .0704527    .0995849
                            |
      never_married#c.grade |
                         1  |   .0015842   .0122091     0.13   0.897    -.0223452    .0255137
                            |
              south#c.grade |
                         1  |   .0077023   .0072285     1.07   0.287    -.0064654    .0218699
                            |
never_married#south#c.grade |
                       1 1  |   .0201431   .0190681     1.06   0.291    -.0172297    .0575159
                            |
                   collgrad |  -.0188035   .0343061    -0.55   0.584    -.0860422    .0484352
                      union |   .0992278   .0191267     5.19   0.000     .0617402    .1367154
                      hours |   .0051919   .0008633     6.01   0.000        .0035    .0068839
                      _cons |   .7215621   .0987522     7.31   0.000     .5280113    .9151128
---------------------------------------------------------------------------------------------

And say we wanted to calculate the averaged counterfactual predictions for the binary variable never_married, that is:

Code:

. margins never_married

Predictive margins                              Number of obs     =      1,875
Model VCE    : OIM

Expression   : Predicted number of events, predict()

-------------------------------------------------------------------------------
              |            Delta-method
              |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
never_married |
           0  |   7.543075   .0676928   111.43   0.000       7.4104     7.67575
           1  |   7.756516   .1943494    39.91   0.000     7.375599    8.137434
-------------------------------------------------------------------------------

We can replicate this result by fixing the values of interest in the dataset and using predict:

Code:

. clonevar clone_never_married = never_married
. qui replace never_married = 0
. qui predict mu0, n
. qui replace never_married = 1
. qui predict mu1, n
. qui replace never_married = clone_never_married
. sum mu0 mu1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         mu0 |      1,875    7.543075     1.97696     1.8537   14.58779
         mu1 |      1,875    7.756516    2.169739   1.476957   14.71113

Comment

Mallory Montgomery

Join Date: Jun 2015

Posts: 8
#7

28 Sep 2017, 18:22

Clever indeed, and so much simpler! Thank you, Joerg Luedicke (StataCorp). You say it's relatively foolproof -- is there any mistake in particular you recommend I check against?
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 105
#8

29 Sep 2017, 10:29

I would say most importantly, make sure to reset the variable of interest to its original values (like I did in my example). Also, if you have missing data and want to restrict your marginal predictions to the actual estimation sample (like it is the default for margins), you will need to do something like predict <var> if e(sample).
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#9

20 Jun 2022, 04:55

I am having the same issue that Mallory Montgomery described. The estimator I am using is a logit instead of a Poisson. I would like to calculate marginal effects (dy/dx) manually. From the code proposed by Joerg Luedicke (StataCorp), I suppose I should continue with:

Code:

sum mu0 mu1 gen d_mu=mu1-mu0 sum d_mu

This yields both the marginal effect (mean) and the confidence intervals of this effect. But how to test whether the coefficient is significant, that is, to estimate the standard error, the z score, and get the p-value (P>|z|)? I cannot think of a way to do that... Any help would be really appreciated.
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#10

20 Jun 2022, 06:52

Something I would like to add is that my case is further complicated by the fact that, unlike in Mallory's example, I am dealing with a continuous variable. I am thinking of getting the marginal effect (dy/dx) of my variable by replacing mu0 in Joerg's example with the variable's average and mu1 with the average plus one. I am not sure, however, whether this approach is correct.

Would be very happy to share more information if needed.

Last edited by Riccardo Valboni; 20 Jun 2022, 06:56.
Comment

Announcement

Calculating marginal effects manually (without -margins-)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment