Margins

Bill Smith

Join Date: Sep 2014
Posts: 158

07 May 2015, 12:29

Using Stata 13.1 under Windows 7.1. Trying to understand margins better so I ran one of the examples from logit:

Code:

. webuse lbw
(Hosmer & Lemeshow data)

. logit low age lwt i.race smoke ptl ht ui

Iteration 0:   log likelihood =   -117.336
Iteration 1:   log likelihood = -101.28644
Iteration 2:   log likelihood = -100.72617
Iteration 3:   log likelihood =   -100.724
Iteration 4:   log likelihood =   -100.724

Logistic regression                               Number of obs   =        189
                                                  LR chi2(8)      =      33.22
                                                  Prob > chi2     =     0.0001
Log likelihood =   -100.724                       Pseudo R2       =     0.1416

------------------------------------------------------------------------------
         low |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.0271003   .0364504    -0.74   0.457    -.0985418    .0443412
         lwt |  -.0151508   .0069259    -2.19   0.029    -.0287253   -.0015763
             |
        race |
      black  |   1.262647   .5264101     2.40   0.016     .2309024    2.294392
      other  |   .8620792   .4391532     1.96   0.050     .0013548    1.722804
             |
       smoke |   .9233448   .4008266     2.30   0.021      .137739    1.708951
         ptl |   .5418366    .346249     1.56   0.118     -.136799    1.220472
          ht |   1.832518   .6916292     2.65   0.008     .4769494    3.188086
          ui |   .7585135   .4593768     1.65   0.099    -.1418484    1.658875
       _cons |   .4612239    1.20459     0.38   0.702    -1.899729    2.822176
------------------------------------------------------------------------------

As a test, this works ok:

Code:

. margins race ,atmeans

Adjusted predictions                              Number of obs   =        189
Model VCE    : OIM

Expression   : Pr(low), predict()
at           : age             =     23.2381 (mean)
               lwt             =    129.8201 (mean)
               1.race          =    .5079365 (mean)
               2.race          =    .1375661 (mean)
               3.race          =    .3544974 (mean)
               smoke           =    .3915344 (mean)
               ptl             =    .1957672 (mean)
               ht              =    .0634921 (mean)
               ui              =    .1481481 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      white  |    .191685   .0454474     4.22   0.000     .1026096    .2807603
      black  |   .4560013    .107471     4.24   0.000      .245362    .6666405
      other  |   .3596187   .0695116     5.17   0.000     .2233784     .495859
------------------------------------------------------------------------------

However, this throws an error:

Code:

. margins smoke ,atmeans
factor 'smoke' not found in list of covariates
r(322);

Not sure why. The coefficient table shows smoke present:

Code:

r(table)[9,10]
               low:        low:        low:        low:        low:        low:        low:        low:        low:        low:
                                        1b.          2.          3.                                                          
               age         lwt        race        race        race       smoke         ptl          ht          ui       _cons
     b  -.02710031  -.01515082           0   1.2626473   .86207916   .92334482   .54183656   1.8325178   .75851348   .46122388
    se   .03645043   .00692588           .   .52641014   .43915315   .40082664     .346249   .69162923   .45937677   1.2045897
     z  -.74348404  -2.1875663           .   2.3985998    1.963049   2.3036014   1.5648755   2.6495667   1.6511794   .38288876
pvalue   .45718868   .02870121           .   .01645789   .04964048   .02124503   .11761211   .00805951   .09870194   .70180224
    ll  -.09854183  -.02872529           .   .23090236   .00135479   .13773904    -.136799   .47694941  -.14184845  -1.8997286
    ul   .04434121  -.00157635           .   2.2943922   1.7228035   1.7089506   1.2204721   3.1880862   1.6588754   2.8221764
    df           .           .           .           .           .           .           .           .           .           .
  crit    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964
 eform           0           0           0           0           0           0           0           0           0           0

What am I missing?

Tags: None

Richard Williams

Join Date: Apr 2014

Posts: 5043
#2

07 May 2015, 12:33

Only factor variables (as denoted by factor variable notation) go to the left of the comma. So race is ok, smoke is not. You could do something like

margins, dydx(smoke)

if you wanted. Here is an overview of margins:

http://www3.nd.edu/~rwilliam/stats/Margins01.pdf

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#3

07 May 2015, 12:36

Also, if you instead said i.smoke in the logit command, you would be ok with using margins. Even if a variable is already a dichotomy, you have to use factor variable notation in the estimation command so stata knows it is a categorical variable.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#4

07 May 2015, 12:38

smoke is entered as continuous variable in your logit, so Stata errors. Typically, with continuous variables there are too many values to calculate the average predicted value at each distinct value, though that is not the case here since smoke is binary. Stata is being conservative, but I think the guard rail is a useful one since it forces you to be explicit about how your variables enter the equation. The solution here is to add the i. prefix.

Last edited by Dimitriy V. Masterov; 07 May 2015, 12:44.
Comment
Bill Smith

Join Date: Sep 2014

Posts: 158
#5

13 May 2015, 13:48

Ok. I understand now. Thank you.

Here's a follow-up question. I'm working on an age-period-cohort analysis. Using glm followed by margins works well (assuming svy set):

Code:

svy, subpop(domain): glm y a i.b i.age, family(binomial) link(logit) iterate(20) margins age, atmeans

I realize that some may take issue with this methodology, prefering marginal means, but this seems to be that way the APC literature handles things.

For the full APC analysis, I'm using a modification of the apc_ie (http://econpapers.repec.org/software...de/s456754.htm) module that allows full survey design information to be incorporated:

Code:

apc_ie4 y a b1 b2 b3, age(age) period(period) cohort(cohort) family(binomial) link(logit) iterate(20) svyopts("svy, subpop(domain))

Although it is based on glm, it does not allow the use of factor variables. I don't know if this is because of the principal components on which it's based or some other reason, but I'm wondering if there is some way to properly group the age, period and cohort variables, and any covariates that will allow a computation of the marginal probabilities similar to that of age alone or age and period, etc. The other problem is that post estimation is not available after running this module. So, margins cannot be run anyway. Anyone have a solution?
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#6

13 May 2015, 16:23

apc_ie was written long before factor variables were part of Stata. I suppose you could try to add support yourself. See

http://www.stata.com/support/faqs/pr...iable-support/

It does say it is a wrapper for Stata's glm command, so maybe it wouldn't be that hard. Or, just figure out what the wrapper is doing and maybe you can use glm directly.

I would consider starting a new thread that included apc_ie in the title . If there is an apc_ie expert out there they may not be paying any attention to this thread.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Bill Smith

Join Date: Sep 2014

Posts: 158
#7

15 May 2015, 13:14

Thanks for the suggestions. Modifying the ado file is probably beyond my level of skill. The author already helped me add survey capability to the program, and it was quite a challenge, even though it was not that difficult. I think that adding factor variable capability is difficult from a conceptual and practical standpoint. And I don't think there are enough users of this module to garner much help elsewhere.

I'm wondering if it would be easier to compute the probabilities directly using nlcom. The only question is how to obtain the means for the covariates, I'm not summarize will give me the correct numbers. Predict does not work after this module.
Comment
Carlos Becerra

Join Date: Jan 2015

Posts: 4
#8

22 Jul 2015, 23:40

Bill, you mentioned that the author helped you add survey capability to the program. Would you mind sharing this with me? I am also using this package with my survey data. Thank you in advance.
Comment

Announcement

Margins

Comment

Comment

Comment

Comment

Comment

Comment

Comment