Differing result for seemingly same - margins - command!

Carsten Preuss

Join Date: Feb 2017
Posts: 42

Differing result for seemingly same - margins - command!

14 Jun 2018, 04:41

Dear community,

I think I came across an issue that has not been subject to any discussion in the forum but is really important.

As I am conducting an analysis using a logistic model, I am interested in calculating the margins/ marginal effects to quantify the results.

However, using the - margins - command, I find that for seemingly the same calculation of the margins, I get different results.
Let me show you a quick example of what I mean. I run expectations of a firm's growth prospect on its current realization and a set of dummies. Both, expectations and realization are categorical variables that take on three different values: -1 (deteriorate), 0 (...remain stable), or 1 (...improve). Because factor variables do not allow negative values, I recode my realization to 1 2 3.

Code:

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
 expectation |    105,693    .0208339     .622453         -1          1
 realisation |     97,551    1.987442    .6193167          1          3

I run the regression first with treating realization as continuous:

Code:

qui ologit expectation realisation i.size i.sector, vce(cluster permid)

When I try to run the margins, I have two options: Treating the independent variable as continuous or as categorical (factor variable).

First, I go with the first option and calculate the margins for expectation= -1, using several specifications:

1) Calculating the margin of realisation at the mean of realisation:

Code:

margins, at(realisation) predict(outcome(-1))

Expression   : Pr(expectation==-1), predict(outcome(-1))
at           : realisation     =    1.992652 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |    .155702   .0012985   119.91   0.000     .1531571    .1582469
------------------------------------------------------------------------------

2) Calculating the margin of realisation at the mean of all variables

Code:

margins, at(realisation) atmeans predict(outcome(-1))

Expression   : Pr(expectation==-1), predict(outcome(-1))
at           : realisation     =    1.992652 (mean)
               1.size          =    .3245139 (mean)
               2.size          =    .3193684 (mean)
               3.size          =    .2653454 (mean)
               4.size          =    .0907724 (mean)
               1.sector        =    .2788699 (mean)
               2.sector        =    .1064883 (mean)
               3.sector        =    .2574927 (mean)
               4.sector        =     .357149 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   .1553722    .001295   119.97   0.000     .1528339    .1579104
------------------------------------------------------------------------------

3) Now, I use the "over" option to predict the margins for all values of realisation:

Code:

margins, over(realisation) predict(outcome(-1))

Expression   : Pr(expectation==-1), predict(outcome(-1))
over         : realisation

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 realisation |
          1  |   .4234504   .0036308   116.63   0.000     .4163341    .4305668
          2  |   .1544277   .0012941   119.33   0.000     .1518912    .1569642
          3  |   .0432777   .0008042    53.82   0.000     .0417016    .0448539
------------------------------------------------------------------------------

4) Lastly, I manually predict the margins for all values of realization:

Code:

margins, at(realisation=(1 2 3)) predict(outcome(-1))

Expression   : Pr(expectation==-1), predict(outcome(-1))

1._at        : realisation     =           1
2._at        : realisation     =           2
3._at        : realisation     =           3

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |    .419882   .0036298   115.68   0.000     .4127678    .4269963
          2  |   .1543755   .0012932   119.37   0.000     .1518409    .1569102
          3  |   .0439714   .0008218    53.51   0.000     .0423608     .045582
------------------------------------------------------------------------------

If I now specify the independent variable as a categorical variable, running the regression:

Code:

qui ologit expectation i.realisation i.size i.sector, vce(cluster permid)

I will get these:

5) Calculating the margin of realisation at the mean of realisation:

Code:

margins, at(realisation) predict(outcome(-1))

Expression   : Pr(expectation==-1), predict(outcome(-1))
at           : 1.realisat~n    =    .1966162 (mean)
               2.realisat~n    =    .6141154 (mean)
               3.realisat~n    =    .1892684 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |    .155136   .0013074   118.66   0.000     .1525736    .1576984
------------------------------------------------------------------------------

6) Calculating the margin of realisation at the mean of all variables

Code:

margins, at(realisation) atmeans predict(outcome(-1))

Expression   : Pr(expectation==-1), predict(outcome(-1))
at           : 1.realisat~n    =    .1966162 (mean)
               2.realisat~n    =    .6141154 (mean)
               3.realisat~n    =    .1892684 (mean)
               1.size          =    .3245139 (mean)
               2.size          =    .3193684 (mean)
               3.size          =    .2653454 (mean)
               4.size          =    .0907724 (mean)
               1.sector        =    .2788699 (mean)
               2.sector        =    .1064883 (mean)
               3.sector        =    .2574927 (mean)
               4.sector        =     .357149 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   .1548055   .0013037   118.75   0.000     .1522503    .1573606
------------------------------------------------------------------------------

7) Now, I use the "over" option to predict the margins for all values of realisation:

Code:

margins, over(i.realisation) predict(outcome(-1))


Expression   : Pr(expectation==-1), predict(outcome(-1))
over         : realisation

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 realisation |
          1  |   .4454591   .0045552    97.79   0.000      .436531    .4543871
          2  |   .1466192   .0014192   103.31   0.000     .1438377    .1494007
          3  |   .0468167   .0009317    50.25   0.000     .0449907    .0486427
------------------------------------------------------------------------------

8) Lastly, I manually predict the margins for all values of realization:

Code:

margins, at(realisation=(1 2 3)) predict(outcome(-1))

Expression   : Pr(expectation==-1), predict(outcome(-1))

1._at        : realisation     =           1
2._at        : realisation     =           2
3._at        : realisation     =           3

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   .4418217   .0045592    96.91   0.000     .4328858    .4507576
          2  |   .1465671   .0014182   103.35   0.000     .1437875    .1493468
          3  |   .0475713   .0009506    50.04   0.000     .0457081    .0494345
------------------------------------------------------------------------------

All the results are really confusing me, because a lot of these specifications should yield the same margins for realisations. For example,

1) Why do I get different results when calculating margins with the "over" function and by manually setting all the values ( (3) and (4)) ?
2) Why do I get different results comparing margins at a certain value of realisation and i.realisation ((3) and (4) and (7) and (8))?
3) Lastly, in specification (2), run predict the margins setting all other variables in the regression at means, but what does happen with these values in specification (1)?

In the end, for each specification, I get a different prediction of the margin. I do not understand how Stata calculates these values in the background.

Can anybody elicit a solution on this?

Thanks a lot

Carsten

(This post is building upon an earlier post of me about marginal effects: https://www.statalist.org/forums/for...ossible-values)

Tags: None

Carsten Preuss

Join Date: Feb 2017

Posts: 42
#2

15 Jun 2018, 06:51

Maybe my question has been overseen by some people who would be able to answer. Here's a kind reminder to push it up the queue.
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

15 Jun 2018, 10:39

You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. It is also helpful if you cut back your posting to the minimum needed to demonstrate your problem. Your posting is extremely long. Note that almost everyone on this list is doing this for free - pushing us is more likely to annoy than get an answer.

1) Why do I get different results when calculating margins with the "over" function and by manually setting all the values ( (3) and (4)) - over is likely to weight by frequency when setting values does not.
2) Why do I get different results comparing margins at a certain value of realisation and i.realisation ((3) and (4) and (7) and (8))?

Every run you made has different assumptions that are documented in the margins documentation.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

15 Jun 2018, 11:47

Originally posted by Carsten Preuss View Post

...

All the results are really confusing me, because a lot of these specifications should yield the same margins for realisations. For example,

1) Why do I get different results when calculating margins with the "over" function and by manually setting all the values ( (3) and (4)) ?
2) Why do I get different results comparing margins at a certain value of realisation and i.realisation ((3) and (4) and (7) and (8))?
3) Lastly, in specification (2), run predict the margins setting all other variables in the regression at means, but what does happen with these values in specification (1)?

In the end, for each specification, I get a different prediction of the margin. I do not understand how Stata calculates these values in the background.

...

(This post is building upon an earlier post of me about marginal effects: https://www.statalist.org/forums/for...ossible-values)

What you ask is documented in the margins manual, as Phil said. The documentation is rather formidable for first time users. I can summarize 1 and 2 briefly, though.

When you run -margins-, Stata will calculated the predicted probabilities of the specified outcome for each observation. Let's switch to another thread you commented on where I gave an example. This one involves logistic regression, so it's simpler to interpret.

Code:

webuse margex quietly logistic outcome i.sex i.group margins margins, over(sex group) margins sex#group margins, at(sex == 1)

When you run margins without any options (first margins call), Stata calculates the predicted probabilities for everyone, then it gives the grand mean.

When you use the -over- option (as with the second call to -margins-), Stata goes over each combination of sex and group. It calculates the average predicted probability in each group. It then presents those averages. The third line produces equivalent results as the second.

In the fourth line with the -at- option, you are telling Stata to magically set everybody to sex == 1. If you had said -atmeans- instead, you were telling Stata to set everybody's covariates to the sample mean, as if you were waving a magic wand. (For categorical variables, Stata will instead (I believe) present an average outcome weighted by each category's prevalence in the sample.) Stata then presents the mean probability of the outcome if everybody were sex == 1 (and all other covariates unchanged).

What should you do? It depends on your goal. If you have a (well-conducted) randomized trial, I'm pretty sure it doesn't really matter if you choose -over- or -at- (albeit RCTs can and do incur bias apart from treatment assignment in innumerable sorts of ways, but that's another story). If you've got an observational study, I think it's OK to start with -over-.

The graph below is one of my projects on predicted quality of life scores in nursing facilities with low- and high-minority residents. The facilities that have a high proportion of racial minority residents tend to have worse average QOL, but their other characteristics also differ meaningfully. The red set of bars is the predicted QOL scores using just the -over- option (i.e. I didn't ask Stata to change anybody's other characteristics). For the blue set of bars, I asked Stata to set every facility's characteristics (apart from low vs high minority) at the sample mean. You can see that erases part of the disparity - i.e. facility characteristics account for some of the disparity in average QOL scores, but not all of it.

PS - another thing to do is to read our own Richard Williams' writing on margins.
https://www3.nd.edu/~rwilliam/stats/Margins01.pdf
Attached Files

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Differing result for seemingly same - margins - command!

Comment

Comment

Comment