Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating marginal effects manually (without -margins-)

    I have some very large data files (~60gb) on which I am running several Poisson regressions and calculating marginal effects for a dummy variable, which is interacted with many other variables.

    With a small dataset, I would run the following (excuse the contrived example):
    Code:
    sysuse auto, clear
    drop if rep78<3
    
    poisson price i.foreign##(c.weight i.rep78 c.length)
    margins foreign, predict(xb) gen(marg)
    (output omitted)

    However, I have found that with my giant data set, -margins- takes much longer than manually calculating marginal effects. In this case, I can calculate them manually by:
    Code:
    gen marg_dom = _b[_cons] + _b[1.foreign]*0 + _b[weight]*weight + _b[4.rep78]*4.rep78 ///
        + _b[5.rep78]*5.rep78 + _b[length]*length + _b[1.foreign#c.weight]*0*weight ///
        + _b[1.foreign#4.rep78]*0*4.rep78 + _b[1.foreign#5.rep78]*0*5.rep78 ///
        + _b[1.foreign#c.length]*0*length
    
    gen marg_for = _b[_cons] + _b[1.foreign]*1 + _b[weight]*weight + _b[4.rep78]*4.rep78 ///
        + _b[5.rep78]*5.rep78 + _b[length]*length + _b[1.foreign#c.weight]*1*weight ///
        + _b[1.foreign#4.rep78]*1*4.rep78 + _b[1.foreign#5.rep78]*1*5.rep78 ///
        + _b[1.foreign#c.length]*1*length

    This goes much faster. Using 1/30th of my sample to test, it took about 40 seconds to calculate marginal effects using -margins- and 1 second to calculate it manually. With the whole dataset, it took hours to estimate one regression and run -margins- afterward. And of course, both methods yield identical results:

    Code:
    . sum marg*
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           marg1 |         59    8.506748     .446367   7.747314   9.524893
           marg2 |         59    9.237163    .6608012   8.193282   10.61953
        marg_dom |         59    8.506748     .446367   7.747314   9.524893
        marg_for |         59    9.237163    .6608012   8.193282   10.61953
    My problem is that my actual specifications have many more variables, including some factor variables with over 100 entries. And, I have several specifications which differ in which variables are included. As a result, manually typing out the marginal effects calculations is simultaneously tedious, time consuming, and error prone. I need only the marginal effects, not the standard errors or other reported values from the output of -margins- (with this many observations, and with a theory-driven model that doesn't include unrelated variables, p-values are always ~0.000).

    My question is, how can I calculate marginal effects such as these without manually typing out each one? Is there a way to do this using -margins- that can go faster?

  • #2
    Try adding the nose (no standard errors) option to margins and see if that speeds things up enough. I think SEs can add a lot of time to the calculations.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      If you are not interested in standard errors, check out the nose option of margins. Using this option suppresses the standard error calculations and is therefore a lot faster.

      Comment


      • #4
        Originally posted by Richard Williams View Post
        Try adding the nose (no standard errors) option to margins and see if that speeds things up enough. I think SEs can add a lot of time to the calculations.
        Originally posted by Joerg Luedicke (StataCorp) View Post
        If you are not interested in standard errors, check out the nose option of margins. Using this option suppresses the standard error calculations and is therefore a lot faster.
        Strangely, this took even longer and then yielded an error, "margins generate variable for 0.my_var invalid name" (and worked fine immediately afterward without the nose option). When I ran it without -gen(marg)- it worked but was only somewhat faster than without the nose option, and still took many times longer than the manual calculation.
        Last edited by Mallory Montgomery; 28 Sep 2017, 13:14.

        Comment


        • #5
          I devised a rather inelegant solution. Essentially, I use -capture- to add every possible coefficient*value. For testing, I included -noisily- so that I could make sure skipped terms were because they were absent from the estimation and not because of a typo. I also checked the results of -margins- and my manual calculations side-by-side for many specifications on a small subsample, to make sure I wasn't missing anything.

          Still, it's is far from foolproof, it's adding hundreds of extra lines to my code, and it's creating ugly output, so I do hope someone more clever comes up with something better.

          Example of my code:
          Code:
          gen marg_1 = _b[_cons] + _b[1.my_var]*1
          gen marg_0 = _b[_cons] + _b[1.my_var]*0
          
          // continuous variables
          local vlist c_var1 c_var2 c_var3 c_var4
          foreach v of local vlist {
              cap noisily replace marg_1 = marg_1 + _b[`v']*`v' + _b[1.my_var#`v']*1*`v'
              cap noisily replace marg_0 = marg_0 + _b[`v']*`v' + _b[1.my_var#`v']*0*`v'
          }
          // example factor variable
          foreach w of numlist 1/52 {
              cap noisily replace marg_1 = marg_1 + _b[`w'.week]*`d'.week
              cap noisily replace marg_0 = marg_0 + _b[`w'.week]*`d'.week
              
              cap noisily replace marg_1 = marg_1 + _b[1.my_var#`w'.week]*1*`d'.week
              cap noisily replace marg_0 = marg_0 + _b[1.my_var#`w'.week]*0*`d'.week
          }

          Comment


          • #6
            A relatively foolproof approach is to simply replace the values of interest in the dataset followed by using predict. For example, say we have a model like this:

            Code:
            . webuse nlsw88
            (NLSW, 1988 extract)
            
            . poisson wage i.never_married##i.south##c.grade collgrad union hours 
            note: you are responsible for interpretation of noncount dep. variable
            
            Iteration 0:   log likelihood = -4930.2271  
            Iteration 1:   log likelihood = -4930.2204  
            Iteration 2:   log likelihood = -4930.2204  
            
            Poisson regression                              Number of obs     =      1,875
                                                            LR chi2(10)       =     971.27
                                                            Prob > chi2       =     0.0000
            Log likelihood = -4930.2204                     Pseudo R2         =     0.0897
            
            ---------------------------------------------------------------------------------------------
                                   wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            ----------------------------+----------------------------------------------------------------
                        1.never_married |  -.0200971   .1810634    -0.11   0.912    -.3749749    .3347806
                                1.south |  -.2601364   .1003275    -2.59   0.010    -.4567747   -.0634981
                                        |
                    never_married#south |
                                   1 1  |  -.2071025   .2781712    -0.74   0.457     -.752308     .338103
                                        |
                                  grade |   .0850188   .0074318    11.44   0.000     .0704527    .0995849
                                        |
                  never_married#c.grade |
                                     1  |   .0015842   .0122091     0.13   0.897    -.0223452    .0255137
                                        |
                          south#c.grade |
                                     1  |   .0077023   .0072285     1.07   0.287    -.0064654    .0218699
                                        |
            never_married#south#c.grade |
                                   1 1  |   .0201431   .0190681     1.06   0.291    -.0172297    .0575159
                                        |
                               collgrad |  -.0188035   .0343061    -0.55   0.584    -.0860422    .0484352
                                  union |   .0992278   .0191267     5.19   0.000     .0617402    .1367154
                                  hours |   .0051919   .0008633     6.01   0.000        .0035    .0068839
                                  _cons |   .7215621   .0987522     7.31   0.000     .5280113    .9151128
            ---------------------------------------------------------------------------------------------
            And say we wanted to calculate the averaged counterfactual predictions for the binary variable never_married, that is:
            ​​​​​​​
            Code:
            . margins never_married
            
            Predictive margins                              Number of obs     =      1,875
            Model VCE    : OIM
            
            Expression   : Predicted number of events, predict()
            
            -------------------------------------------------------------------------------
                          |            Delta-method
                          |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
            --------------+----------------------------------------------------------------
            never_married |
                       0  |   7.543075   .0676928   111.43   0.000       7.4104     7.67575
                       1  |   7.756516   .1943494    39.91   0.000     7.375599    8.137434
            -------------------------------------------------------------------------------
            ​​​​​​​
            We can replicate this result by fixing the values of interest in the dataset and using predict:
            ​​​​​​​
            Code:
            . clonevar clone_never_married = never_married
            . qui replace never_married = 0
            . qui predict mu0, n
            . qui replace never_married = 1
            . qui predict mu1, n
            . qui replace never_married = clone_never_married
            . sum mu0 mu1
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
                     mu0 |      1,875    7.543075     1.97696     1.8537   14.58779
                     mu1 |      1,875    7.756516    2.169739   1.476957   14.71113

            Comment


            • #7
              Clever indeed, and so much simpler! Thank you, Joerg Luedicke (StataCorp). You say it's relatively foolproof -- is there any mistake in particular you recommend I check against?

              Comment


              • #8
                I would say most importantly, make sure to reset the variable of interest to its original values (like I did in my example). Also, if you have missing data and want to restrict your marginal predictions to the actual estimation sample (like it is the default for margins), you will need to do something like predict <var> if e(sample).

                Comment


                • #9
                  I am having the same issue that Mallory Montgomery described. The estimator I am using is a logit instead of a Poisson. I would like to calculate marginal effects (dy/dx) manually. From the code proposed by Joerg Luedicke (StataCorp), I suppose I should continue with:

                  Code:
                  sum mu0 mu1
                  gen d_mu=mu1-mu0
                  sum d_mu
                  This yields both the marginal effect (mean) and the confidence intervals of this effect. But how to test whether the coefficient is significant, that is, to estimate the standard error, the z score, and get the p-value (P>|z|)? I cannot think of a way to do that... Any help would be really appreciated.

                  Comment


                  • #10
                    Something I would like to add is that my case is further complicated by the fact that, unlike in Mallory's example, I am dealing with a continuous variable. I am thinking of getting the marginal effect (dy/dx) of my variable by replacing mu0 in Joerg's example with the variable's average and mu1 with the average plus one. I am not sure, however, whether this approach is correct.

                    Would be very happy to share more information if needed.
                    Last edited by Riccardo Valboni; 20 Jun 2022, 06:56.

                    Comment

                    Working...
                    X