I have some very large data files (~60gb) on which I am running several Poisson regressions and calculating marginal effects for a dummy variable, which is interacted with many other variables.
With a small dataset, I would run the following (excuse the contrived example):
(output omitted)
However, I have found that with my giant data set, -margins- takes much longer than manually calculating marginal effects. In this case, I can calculate them manually by:
This goes much faster. Using 1/30th of my sample to test, it took about 40 seconds to calculate marginal effects using -margins- and 1 second to calculate it manually. With the whole dataset, it took hours to estimate one regression and run -margins- afterward. And of course, both methods yield identical results:
My problem is that my actual specifications have many more variables, including some factor variables with over 100 entries. And, I have several specifications which differ in which variables are included. As a result, manually typing out the marginal effects calculations is simultaneously tedious, time consuming, and error prone. I need only the marginal effects, not the standard errors or other reported values from the output of -margins- (with this many observations, and with a theory-driven model that doesn't include unrelated variables, p-values are always ~0.000).
My question is, how can I calculate marginal effects such as these without manually typing out each one? Is there a way to do this using -margins- that can go faster?
With a small dataset, I would run the following (excuse the contrived example):
Code:
sysuse auto, clear drop if rep78<3 poisson price i.foreign##(c.weight i.rep78 c.length) margins foreign, predict(xb) gen(marg)
However, I have found that with my giant data set, -margins- takes much longer than manually calculating marginal effects. In this case, I can calculate them manually by:
Code:
gen marg_dom = _b[_cons] + _b[1.foreign]*0 + _b[weight]*weight + _b[4.rep78]*4.rep78 /// + _b[5.rep78]*5.rep78 + _b[length]*length + _b[1.foreign#c.weight]*0*weight /// + _b[1.foreign#4.rep78]*0*4.rep78 + _b[1.foreign#5.rep78]*0*5.rep78 /// + _b[1.foreign#c.length]*0*length gen marg_for = _b[_cons] + _b[1.foreign]*1 + _b[weight]*weight + _b[4.rep78]*4.rep78 /// + _b[5.rep78]*5.rep78 + _b[length]*length + _b[1.foreign#c.weight]*1*weight /// + _b[1.foreign#4.rep78]*1*4.rep78 + _b[1.foreign#5.rep78]*1*5.rep78 /// + _b[1.foreign#c.length]*1*length
This goes much faster. Using 1/30th of my sample to test, it took about 40 seconds to calculate marginal effects using -margins- and 1 second to calculate it manually. With the whole dataset, it took hours to estimate one regression and run -margins- afterward. And of course, both methods yield identical results:
Code:
. sum marg* Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- marg1 | 59 8.506748 .446367 7.747314 9.524893 marg2 | 59 9.237163 .6608012 8.193282 10.61953 marg_dom | 59 8.506748 .446367 7.747314 9.524893 marg_for | 59 9.237163 .6608012 8.193282 10.61953
My question is, how can I calculate marginal effects such as these without manually typing out each one? Is there a way to do this using -margins- that can go faster?
Comment