Saving parameters post estimation, by strata

Tom Yates

Join Date: Mar 2015

Posts: 34
#1

Saving parameters post estimation, by strata

16 Nov 2015, 06:25

Dear STATAlist,

I have a, hopefully simple, query about storing parameters post estimation.

I am doing the following to obtain odds of a binary outcome (tenmm) and mean age, both with standard errors, from models adjusted for the clustered way in which subjects were sampled...

xtset TBSchool
xtlogit tenmm, re or
predict A, xb
predict B, stdp

quadchk, nooutput

xtset, clear
xtreg v1age if tenmm!=., i(TBSchool) re
predict C, xb
predict D, stdp

The values are then fed through into subsequent calculations. That works fine.

I now wish to repeat the same thing for separate strata - by gender, age group, etc. The simple thing to do would be to repeat the regression models, restricted to subjects in that strata. However, I think that would result in the data being used to evaluate clustering being different in each subgroup.

Does adding an if statement into the predict command after running the regression model with all subjects, work?

e.g.
xtlogit tenmm, re or
predict A if gender==1, xb

Thanks,
Tom
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

16 Nov 2015, 16:02

Does adding an if statement into the predict command after running the regression model with all subjects, work?

It depends on what you mean by "work."

If you run the code shown near the bottom of your post, you will use all available cases in the data set to do the estimation. When you apply in -if- condition to the -predict- command, then the variable A you are calculating will be set to missing for any observations that don't meet the -if- condition. If that's what you want to accomplish, then it "works."

But I'm a little bit puzzled at the logic here. The command -xtlogit tenmm, re or- performs a logistic regression with no predictor variables in the fixed effects model. Consequently -xb- will be the same for all observations in the data set. So whatever calculations you do with A, except perhaps count the number of non-missing observations, is going to give you the same result regardless of gender, or any other variable in your model. It doesn't seem like a sensible thing to do.

As a complete aside, I don't understand the logic of doing -xtset, clear- and then specifying -i(TBSchool)- in the subsequent regression when the data were already -xtset TBSchool-. Why not just leave the -xtset- alone: it would have the same effect. Were there perhaps other commands in between that you don't show us where it was necessary to not have TBSchool as the panel variable?
Comment
Tom Yates

Join Date: Mar 2015

Posts: 34
#3

16 Nov 2015, 23:06

Thanks for the message.

I am using the output from the regressions to calculate annual risk of infection...

Annual risk = 1 - (1 - prevalence)^(1/mean age)

I convert the odds obtained from the logistic regression to risk and use the standard errors to estimate conservative CIs on these estimates (reflecting uncertainty in both the age and the prevalence parameters).

I now want to calculate annual risk for subgroups of the population - boys, those living in urban areas, etc. But I would still like to account for clustering by school. And, because I have a lot of subgroups, I want to automate things, so I don't have to do lots of manual calculations, e.g.

gen E = exp(A)
gen F = exp(A-1.96*B)
gen G = exp(A+1.96*B)

gen H = (1-(1-(E/(1+E)))^(1/C))*100
gen I = (1-(1-(F/(1+F)))^(1/(C+1.96*D)))*100
gen J = (1-(1-(G/(1+G)))^(1/(C-1.96*D)))*100

disp H, "(" I, "-" , J ")"

Hope that makes sense. Point taken regards the xtset.

Best wishes,
Tom
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

17 Nov 2015, 06:21

So if I understand what you want, I think it can be done much more simply than the approach you're taking. Something like this:

Code:

xtset TBSchool xtlogit tenmm i.sex i.urban_vs_rural, re // ETC. margins, over(sex urban_vs_rural /*etc.*/)

That will give you the average predicted probability in each combination of sex, urban/rural, etc. all adjusted for the other variables and accounting for the clustering within schools. The confidence intervals come along for the ride.

If you want to do this with predicted probabilities that are specific to each subgroup but not adjusted for anything except the school clustering, then it's just:

Code:

xtset TBSchool xtlogit tenmm, re margins, over(sex urban_vs_rural /*etc.*/)

If you want to do this for just one variable at a time, you can easily modify the code to do that.

Code:

xtset TBSchool xtlogit tenmm, re foreach v of varlist sex urban_vs_rural /*etc.*/ { margins, over(`v') }
Comment
Tom Yates

Join Date: Mar 2015

Posts: 34
#5

17 Nov 2015, 08:04

Thanks, that looks good but I am unclear how I can then extract the odds of infection, mean age, and their respective standard errors, for e.g. boys, or rural residents, and feed that into the equations to calculate an annual risk of infection for that subgroup?

Best wishes,
Tom
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#6

17 Nov 2015, 14:03

I don't understand what you are asking for. The code in #4 gives you the annual risk of infection for subgroups. What do you mean when you say you want to extract the other information and feed that into equations to calculate an annual risk of infection for that subgroup? It's already done for you by -margins-.
Comment
Tom Yates

Join Date: Mar 2015

Posts: 34
#7

18 Nov 2015, 01:19

-margins- calculates annual risk of infection? I thought it simply provided stratum specific estimates of odds of infection and of mean age with a confidence interval? To calculate annual risk of infection, I think I have to feed these values into formulae, e.g.

gen E = exp(A)
gen F = exp(A-1.96*B)
gen G = exp(A+1.96*B)

gen H = (1-(1-(E/(1+E)))^(1/C))*100
gen I = (1-(1-(F/(1+F)))^(1/(C+1.96*D)))*100
gen J = (1-(1-(G/(1+G)))^(1/(C-1.96*D)))*100

disp H, "(" I, "-" , J ")"

The tenmm variable simply codes whether an individual has been infected. These equations then estimate an annualised incidence rate.

After -margins- is it possible to store the point estimates and standard errors for use in subsequent calculations?

Thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#8

18 Nov 2015, 05:43

After -xtlogit-, the default statistic calculated by -margins- is the predicted probability, not the odds. After -xtreg- it is the predicted mean. So it would seem these aer what you want. As for storing the results of -margins-, the margin statistics calculated can be round in returned matrix r(b). The standard errors are not directly stored, but they are the square roots of the diagonal elements of the returned matrix r(V).
Comment
Tom Yates

Join Date: Mar 2015

Posts: 34
#9

18 Nov 2015, 06:23

Thanks for all the advice. Will go and read up on how matrices in STATA work!

Best wishes,
Tom
Comment
Tom Yates

Join Date: Mar 2015

Posts: 34
#10

18 Nov 2015, 09:51

I am not sure I am getting the results I expect. I ran...

xtset TBSchool
xtlogit tenmm i.gender, re or
margins, over(v1ageint)
margins, over(gender)

xtreg v1age, re
margins, over(v1ageint)
margins, over(gender)
I get the same predicted mean age for each age group (which must be wrong) and identical predicted ages (to 6dp) for each gender (which also must be wrong). E.g.
. margins, over(gender)
Warning: prediction constant over observations.

Predictive margins Number of obs = 1,258
Model VCE : Conventional

Expression : Linear prediction, predict()
over : gender

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gender |
F | 7.699861 .0201956 381.26 0.000 7.660278 7.739443
M | 7.699861 .0201956 381.26 0.000 7.660278 7.739443
------------------------------------------------------------------------------
Note the warning. Am I missing a trick?

Thanks,
Tom
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#11

18 Nov 2015, 16:25

OK, some of the information I gave you I misremembered, and it was wrong. Following-xtlogit-, the default statistic for margins is not the predicted probability, it's the fixed effects linear predictor, which is not what you want. So after the -xtlogit- command, change the -margins- command to include option -predict(pr)-.

The situation after -xtreg- is more complicated. The default prediction here is again the fixed effects linear predictor. What you want is -xbu- instead. But for some reason, -margins- will not allow that within its -predict()- option following -xtreg-. Neither will -margins- after -mixed-. In general, when Stata won't allow you to do something, there is usually a good reason for it. I'm not sure what that reason is. The help file says it's because those statistics depend on stochastic quantities other than e(b), but I don't quite understand why that is a limitation.

So following -xtreg-, if this is to be done, it will have to be a bit more complicated, and it won't be done with -margins-. You are looking for the mean value of age specific to each level of v1ageint and gender. So I would do it this way:

Code:

xtreg v1age, re predict xbu, xbu by gender, sort: summarize xbu by v1ageint, sort: summarize xbu

This should get you what you want, but the fact that -margins- won't do this for you leaves me feeling that there is something wrong with this approach. Perhaps somebody with a deeper understanding of this issue will jump in here.
Comment
Tom Yates

Join Date: Mar 2015

Posts: 34
#12

19 Nov 2015, 00:02

Thanks for the message. Let's see if anyone else jumps in.

In this example, the age problem is perhaps less of an issue. I have no reason to believe the age structure of schools differs and the age range of children in the study was narrow. I can probably use much simpler methods. However, for my own learning, I'd be interested in how to do this.

Best wishes,
Tom
Comment
Tom Yates

Join Date: Mar 2015

Posts: 34
#13

19 Nov 2015, 00:23

p.s. I think there is a problem with the code in #11. Some of the predicted mean ages given for each age bracket lie outside the bracket. Also the mean ages for each gender look too similar (i.e. close agreement to lots of decimal places).
Comment

Announcement

Saving parameters post estimation, by strata

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment