how to calculate marginal means for svy: meglm when combined with mi estimate

Lorena Estrada

Join Date: Feb 2016

Posts: 15
#1

how to calculate marginal means for svy: meglm when combined with mi estimate

05 Mar 2016, 13:47

Hi Stata forum -

We are looking for the correct syntax to predict probabilities and margins at different ages with multiple imputations on multilevel growth curve models adjusted/weighted for complex survey design. Below is the syntax without the mi estimate as a reference, but this is not working at all for MI data. Your guidance in working through this is greatly appreciated.

MI MODEL

mi svyset psu2, weight(schwt1) || aid2, weight(gcmwgt)

mi estimate, cmdok: svy: meglm viol wvage agesq agecubed female agefemale agesqfem , exposure(wvage) family(binomial) link(logit)

PROBABILITIES/MARGINS SYNTAX WITHOUT MI

meglm viol wvage agesq agecubed i.female agefemale agesqfem agecubedfem [pweight=gcmwgt], exposure(wvage) family(binomial) link(logit) vce(cluster psuscid) or

predict probab
margins, at (wvage=11 female=0 agesq=121 agecubed=1331 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=12 female=0 agesq=144 agecubed=1728 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=13 female=0 agesq=169 agecubed=2197 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=14 female=0 agesq=196 agecubed=2744 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=15 female=0 agesq=225 agecubed=3375 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=16 female=0 agesq=256 agecubed=4096 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=17 female=0 agesq=289 agecubed=4913 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=18 female=0 agesq=324 agecubed=5832 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=19 female=0 agesq=361 agecubed=6859 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=20 female=0 agesq=400 agecubed=8000 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=21 female=0 agesq=441 agecubed=9261 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=22 female=0 agesq=484 agecubed=10648 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=23 female=0 agesq=529 agecubed=12167 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=24 female=0 agesq=576 agecubed=13824 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=25 female=0 agesq=625 agecubed=15625 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=26 female=0 agesq=676 agecubed=17576 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=27 female=0 agesq=729 agecubed=19683 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=28 female=0 agesq=784 agecubed=21952 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=29 female=0 agesq=841 agecubed=24389 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=30 female=0 agesq=900 agecubed=27000 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=31 female=0 agesq=961 agecubed=29791 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=32 female=0 agesq=1024 agecubed=32768 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=33 female=0 agesq=1089 agecubed=35937 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=34 female=0 agesq=1156 agecubed=39304 agefemale=0 agesqfem=0 agecubedfem=0)
margins, at (wvage=35 female=0 agesq=1225 agecubed=42875 agefemale=0 agesqfem=0 agecubedfem=0)

margins, at (wvage=11 female=1 agesq=121 agecubed=1331 agefemale=11 agesqfem=121 agecubedfem=1331)
margins, at (wvage=12 female=1 agesq=144 agecubed=1728 agefemale=12 agesqfem=144 agecubedfem=1728)
margins, at (wvage=13 female=1 agesq=169 agecubed=2197 agefemale=13 agesqfem=169 agecubedfem=2197)
margins, at (wvage=14 female=1 agesq=196 agecubed=2744 agefemale=14 agesqfem=196 agecubedfem=2744)
margins, at (wvage=15 female=1 agesq=225 agecubed=3375 agefemale=15 agesqfem=225 agecubedfem=3375)
margins, at (wvage=16 female=1 agesq=256 agecubed=4096 agefemale=16 agesqfem=256 agecubedfem=4096)
margins, at (wvage=17 female=1 agesq=289 agecubed=4913 agefemale=17 agesqfem=289 agecubedfem=4913)
margins, at (wvage=18 female=1 agesq=324 agecubed=5832 agefemale=18 agesqfem=324 agecubedfem=5832)
margins, at (wvage=19 female=1 agesq=361 agecubed=6859 agefemale=19 agesqfem=361 agecubedfem=6859)
margins, at (wvage=20 female=1 agesq=400 agecubed=8000 agefemale=20 agesqfem=400 agecubedfem=8000)
margins, at (wvage=21 female=1 agesq=441 agecubed=9261 agefemale=21 agesqfem=441 agecubedfem=9261)
margins, at (wvage=22 female=1 agesq=484 agecubed=10648 agefemale=22 agesqfem=484 agecubedfem=10648)
margins, at (wvage=23 female=1 agesq=529 agecubed=12167 agefemale=23 agesqfem=529 agecubedfem=12167)
margins, at (wvage=24 female=1 agesq=576 agecubed=13824 agefemale=24 agesqfem=576 agecubedfem=13824)
margins, at (wvage=25 female=1 agesq=625 agecubed=15625 agefemale=25 agesqfem=625 agecubedfem=15625)
margins, at (wvage=26 female=1 agesq=676 agecubed=17576 agefemale=26 agesqfem=676 agecubedfem=17576)
margins, at (wvage=27 female=1 agesq=729 agecubed=19683 agefemale=27 agesqfem=729 agecubedfem=19683)
margins, at (wvage=28 female=1 agesq=784 agecubed=21952 agefemale=28 agesqfem=789 agecubedfem=21952)
margins, at (wvage=29 female=1 agesq=841 agecubed=24389 agefemale=29 agesqfem=841 agecubedfem=24389)
margins, at (wvage=30 female=1 agesq=900 agecubed=27000 agefemale=30 agesqfem=900 agecubedfem=27000)
margins, at (wvage=31 female=1 agesq=961 agecubed=29791 agefemale=31 agesqfem=961 agecubedfem=29791)
margins, at (wvage=32 female=1 agesq=1024 agecubed=32768 agefemale=32 agesqfem=1024 agecubedfem=32768)
margins, at (wvage=33 female=1 agesq=1089 agecubed=35937 agefemale=33 agesqfem=1089 agecubedfem=35937)
margins, at (wvage=34 female=1 agesq=1156 agecubed=39304 agefemale=34 agesqfem=1156 agecubedfem=39304)
margins, at (wvage=35 female=1 agesq=1225 agecubed=42875 agefemale=35 agesqfem=1225 agecubedfem=42875)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29907
#2

05 Mar 2016, 14:09

I don't know if -margins- can even be used after -mi estimate-. Perhaps somebody else can chime in here on that.

That said, you have not set up your basic estimation command in a way that would facilitate the use of margins. You should read about factor variable notation (-help fvvarlist-). Your estimation command will work much more easily with -margins- if you revise it in the following way:

Code:

meglm viol c.wvage##c.wvage##c.wvage i.female c.agefemale##c.agefemale##c.agefemale [pweight=gcmwgt], exposure(wvage) family(binomial) link(logit) vce(cluster psuscid) or margins, at(wvage=(11(1)35) agefemale = (11(1)35))

Note how your lengthy list of complicated margins commands reduces to a one-liner, and a short one at that! The time needed to learn to use factor variable notation repays itself very quickly.

But again, I'm not sure this is even possible after -mi estimate-.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29907
#3

05 Mar 2016, 14:12

By the way, I don't understand the use of -exposure(wvage)- in your model. That introduces ln(wvage) as a predictor in the model and constrains its coefficient to be 1. That makes sense to do in Poisson or negative binomial models, but I don't grasp the meaning of it in a logistic model. Perhaps I'm missing something?
Comment
Lorena Estrada

Join Date: Feb 2016

Posts: 15
#4

05 Mar 2016, 14:54

We need to be able to graph the growth curves in order to interpret the results. Is there something you recommend for being able to do so? I'm relatively new at Stata and can use all the help in the world. Also, thank you for pointing out the -exposure(wvage)- . I sent the code to the main developers of the data and they ok'd this one. Our outcome is 1/0, should I be taking that out of the code entirely?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29907
#5

05 Mar 2016, 17:02

If, as I fear, margins is not available after -mi estimate-, you may need to "roll your own." I have not tested this code-- in fact, I prefer you think of this more as pseudo-code, but what I have in mind looks something like this (and yes, it involves going back to your non-factor-variable-notation version:

Code:

// NOTE MEGLM COMMAND REVISED TO PUT i.female AT END OF LIST OF PREDICTORS mi estimate: meglm viol wvage agesq agecubed agefemale agesqfem agecubedfem female [pweight=gcmwgt], exposure(wvage) family(binomial) link(logit) vce(cluster psuscid) or // GET COEFFICIENT ESTIMATES matrix b = e(b_mi) tempfile growth_curve capture postutil clear postfile handle int age byte predicted_m predicted_f using `growth_curve' forvalues a = 11/35 { local predicted_m = b[1, 8] // CONSTANT TERM local predicted_f = b[1, 7] + b[1, 8] // CONSTANT TERM + FEMALE forvalues j = 1/3 { // LOOP OVER POWERS local predicted_m = `predicted_m' + b[1, `j']*`a'^`j' local predicted_f = `predicted_f' + (b[1, `j'] + b[1, 3+`j'])*`a'^`j' } post handle (`a') (0) (`predicted_m') (`predicted_f') } post close handle // USE PREDICTED VALUES TO PLOT GROWTH CURVES use `growth_curve', clear // TRANSFORM LINEAR PREDICTOR TO PREDICTED PROBABILITY foreach v of varlist predicted_* { replace `v' = invlogit(`v') } label var predicted_m "Predicted Probability (Male)" label var predicted_f "Predicted Probability (Female)" label var age "Age" graph twoway line predicted_m predicted_f age, sort

I want to emphasize again that the above is untested and may contain numerous errors. In particular, I'm assuming that in the e(b_mi) matrix, the first column will be the coefficient of wvage, the second that of wvage^2, third that of wvage^3, then fourth is femage's, followed immediately by femage^2's and femage^3's, and then, finally, female's and the constant term. But you will need to actually look at that matrix to be sure that's true. There may be some other things in there (e.g. for the exposure variable), or a different ordering that will require modifications to the code.

By the way, in the course of looking more closely at your original post and writing the above, I realized that, were you able to make use of -margins-, the factor-variable-notation version of your -meglm- command is even simpler than what I gave you. It is

Code:

meglm viol i.female##c.wwvage##c.wvage##c.wvage [pweight = gcmwgt], // ... etc.

Concerning the exposure option, I can't really advise you whether to leave it out. If the main developers of the data OK'd it, they probably know what they're doing. Still, it seems very odd, even bizarre, to me. I have never seen it used in a logistic model, and I cannot for the life of me understand what it means in that context. I think you should ask the developers not just if it's OK but ask them to explain why it's there and what it does and what it means. Perhaps if challenged to explain it, they may see that they've made a mistake. Or they may have a good explanation for it (and if that's the case, I hope you'll follow up and explain it to me.)
Comment
Lorena Estrada

Join Date: Feb 2016

Posts: 15
#6

05 Mar 2016, 19:08

About the code above: We have a national sample (multistage, complex sampling design) and our primary aim is to find the right change trajectories for behaviors/mental health outcomes for specific demographic subgroups. All of these outcomes have been found to have big gender differences, so we're also exploring gender interactions in the change trajectories. My point is that in the end, we're graphing different models according to each subgroup.

exposure(wvage) syntax background:
We are running a growth curve model where level-1 represents a binary violence outcome and was measured at multiple waves and the age/polynomials, level-2 the vector of fixed independent variables at the individual level; and level-3 the clustering of schools where students were sampled from without any variables (Add Health). From my understanding, growth curve models require these -exposure(wvage)- (or whatever the time/development indicator). We're are running both linear and binary models, but have included this option for both. I'm all for parsimony in an already complicated analysis, so I'm all ears.

On a side note to the main analysis: I'm getting inconsistent messages on the need for - cmdok - to run the combination of mi estimate: svy: meglm in Stata 14. We have the same MP version of the program in multiple computers in our lab and in some are getting a weird message requiring the cmdok option, and not in others. Has this happened to anyone else?
Comment
Lorena Estrada

Join Date: Feb 2016

Posts: 15
#7

05 Mar 2016, 19:11

looking back, i see that we would create a separate growth_curve file for each model we need to plot.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29907
#8

05 Mar 2016, 19:44

From my understanding, growth curve models require these -exposure(wvage)- (or whatever the time/development indicator).

That's news to me. When you are modeling a count variable, one would want to include time as an exposure variable in the model because, all else equal, one might expect the count of events to grow proportionally to the time period under observation. But in a logit model, you do not have a count. The outcome either occurs or does not occur. There is no reason to think, except perhaps for very short time periods, that the probability of its occurrence would be proportional to the duration of observation. Would it increase with increasing time of observation--yes! But not as a direct proportion. (In fact, it can't increase as a direct proportion because that means if you waited long enough, the probability would exceed 1. This objection also applies to linear probability models.) So you can make a strong case for duration of observation in the model in some way, but not in the -exposure()- option. You already have it (wvage) in the model as a cubic polynomial, ultimately subjected to the invlogit() transformation. If that isn't good enough adjustment for this factor, it's hard to imagine what is. In any case, adding it as an exposure() implies a proportional to time model that is inconsistent with a probability outcome, as opposed to a count outcome

I have not used multiple imputation enough to have an impression that Stata is inconsistent in the need for the -cmdok- option. Suffice it to say that -meglm- is not among the commands that -mi estimate- is designed to work with, so I would expect that none of those analyses would run without the -cmdok- override. That said, why not just replace -meglm, family(binomial) link(logistic)- with -meqrlogit-, which will estimate exactly the same model (though the estimation algorithm is different), and should not require -cmdok-. I do not know what erroneous conclusions, if any, might ensue from using -cmdok- to override Stata's preference not to -mi estimate- with -meglm-, but usually when Stata throws obstacles in your path there is a good reason for it, and one should be cautious, if not entirely deterred.

By the way, I also assume that your posted -meglm- command is an abbreviation of what you are actually running, since it makes no mention of any individual level variables other than age and sex (maybe there aren't any others?) and has no specification for any random effects at the individual or school level. As set out here, it is a flat 1-level model. If your intention is to let the -svy:- calculations fully account for the sampling design and you are not really interested in random effects, then rather than using any of the -me- commands, why not go straight back to plain old -logit-, which, by the way, will also run a lot faster if your data set is large?
Comment
Lorena Estrada

Join Date: Feb 2016

Posts: 15
#9

06 Mar 2016, 10:02

we also tried to combine mi estimate: svy: xtmixed for our continuous outcomes, but it wouldn't allow the three together.

Last edited by Lorena Estrada; 06 Mar 2016, 10:37.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29907
#10

06 Mar 2016, 11:21

Well, something is definitely wrong here. I think you need to post the exact commands you are using and output you are getting. It is quite clear from the -help mi estimate- file that -mi estimate- supports -mixed- (which is the current name for the command -xtmixed-) both with and without the svy: prefix. So you are doing something wrong--but we can only imagine what that might be without more information.
Comment
Lorena Estrada

Join Date: Feb 2016

Posts: 15
#11

07 Mar 2016, 19:31

Here is the code and messages.

HERE IT IS WHEN DONE WHEN IT'S BEEN SVYSET PRIOR:

. mi estimate: svy: mixed depress wvage agesq agecubed female || psu2: wvage, covariance(
> unst) || aid2: wvage, covariance(unst) mle
mixed is not supported by svy with vce(linearized); see help svy estimation for a list of
Stata estimation commands that are supported by svy
an error occurred when mi estimate executed svy:mixed on m=1
r(322);

HERE IT IS WHEN I USE THE STRUCTURE/WEIGHTS DIRECTLY:

. mi estimate: mixed depress wvage agesq agecubed female [pw=nhwwgt] || aid2: wvage, pweight(schwt1) pweight(size) cov(un) variance mle
option pweight() not allowed
an error occurred when mi estimate executed mixed on m=1
r(198);

HERE IT IS WHEN I TRT MELOGIT (Basically the same as MEGLM message):

. mi estimate: svy: melogit viol wvage agesq agecubed female || psu2: wvage, covariance(unst) || aid2: wvage, covariance(unst) mle
mi estimate: command not supported
svy:melogit is not officially supported by mi estimate; see mi estimation for a list of Stata estimation commands that are
supported by mi estimate. You can use option cmdok to allow estimation anyway.
r(198);

Last edited by Lorena Estrada; 07 Mar 2016, 19:35.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29907
#12

07 Mar 2016, 20:09

. mi estimate: svy: mixed depress wvage agesq agecubed female || psu2: wvage, covariance(
> unst) || aid2: wvage, covariance(unst) mle
mixed is not supported by svy with vce(linearized); see help svy estimation for a list of
Stata estimation commands that are supported by svy
an error occurred when mi estimate executed svy:mixed on m=1
r(322);

OK. I was sort of wrong when I said that -mi estimate- supports -svy: mixed-; it does, but svy: does not support -mixed- when you use the linearized vce.

. mi estimate: mixed depress wvage agesq agecubed female [pw=nhwwgt] || aid2: wvage, pweight(schwt1) pweight(size) cov(un) variance mle
option pweight() not allowed
an error occurred when mi estimate executed mixed on m=1
r(198);

I believe the problem here is that you have tried to assign the pweight() option twice at the same level (aid2).

. mi estimate: svy: melogit viol wvage agesq agecubed female || psu2: wvage, covariance(unst) || aid2: wvage, covariance(unst) mle
mi estimate: command not supported
svy:melogit is not officially supported by mi estimate; see mi estimation for a list of Stata estimation commands that are
supported by mi estimate. You can use option cmdok to allow estimation anyway.
r(198);

You are right: there does not appear to be a mixed-effects logistic regression command that is supported by both -svy:- and -mi estimate:-

Last edited by Clyde Schechter; 07 Mar 2016, 20:10. Reason: Remove emoticon that, frankly, I have no idea how it got in there.
Comment
Lorena Estrada

Join Date: Feb 2016

Posts: 15
#13

08 Mar 2016, 08:19

At least we know that we've tried all the different options and that MI ESTIMATE, CMDOK: SVY: MEGLM seems to be the one that can handle what we're looking for (the models will get more complicated with random effects in the next study). So back to graphing our results and the code you developed in post #5, can you point out the parts I would have to edit when we're trying to plot a model with fewer polynomials/interactions?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29907
#14

08 Mar 2016, 10:30

It is the coordinates in the references to _b[*,*] that would change with the model. You would need to do a "dry run" of your model on a small enough sample of your data that you can get some (useless) results quickly, and then you need to examine e(b) to see which columns contain the coordinates for the variables you are interested in, and adjust the code accordingly. If you keep the relevant terms in a contiguous block in the varlist in the regression model command you should be able to maintain the relatively simple structure. Similarly, if it were only a quadratic instead of a cubic model, the inner -forvalues- loop would go 1/2 instead of 1/3. Ultimately, it boils down to making the values of `j' cover all and only the column numbers in e(b) of the terms you are interested in.

There are ways to partially automate this, using the colnumb() function, and replacing the -forvalues j = 1/3- loop by a loop over the corresponding variables. If the terms you are accumulating are the same across all the models you run, but their positions in the regression command vary, then this could be the way to go. But if you will be accumulating different terms with different models, then I think it is probably just easier to change the numbers inside the loop after manually inspecting e(b), as suggested in the first paragraph.
Comment

Announcement

how to calculate marginal means for svy: meglm when combined with mi estimate

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment