Proc Glimmix in SAS to Stata

SeungYong Han

Join Date: Jul 2015

Posts: 53
#1

Proc Glimmix in SAS to Stata

09 Oct 2018, 10:44

Hello,

I am trying to convert Proc Glimmix command in SAS to Stata.
Briefly, I am estimating an Age-Period-Cohort model for a binary outcome, OBESE, with age as a fixed effect and period and cohort effects as random effects in the model. The data I use is repeated cross-sectional, not longitudinal, and it is appropriate when the APC model is estimated.

The SAS syntax below is an example in the APC website by Yang C. Yang. The SAS code works for my own analysis, but for some reason, I need to write the same code in Stata.
: http://yangclaireyang.web.unc.edu/ag...-applications/

Here is the SAS code.

Code:

proc glimmix data=NHANES_Obesity maxopt=25000; class PERIOD COHORT; model OBESE(event='1') = AGE_C AGE_C2 /solution CL dist=binary; random PERIOD COHORT / solution; covtest GLM / WALD; NLOPTIONS TECHNIQUE=NRRIDG; title "Table 7.4: HAPC-CCREM of Obesity Trends, NHANES 1971-2008"; run;

Since the outcome is binary, I think I need to use melogit in Stata, but I might be wrong. Please help and let me know if anything is not clear. Thank you.
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

09 Oct 2018, 12:43

SAS's PROC GLIMMIX appears to be the direct equivalent of Stata's -meglm- command. However, you're telling it to fit a model for a binomial outcome, and you're not specifying any link (thus causing the procedure to default to the logit link). You could just use Stata's -melogit- command. Note that Stata has an alternative estimator for a mixed effect logit model, -meqrlogit-, which can sometimes converge if -melogit- doesn't.

The SAS syntax requests that period and cohort are treated as categorical (via the class statement). In Stata, the equivalent would be to prefix them with i. The SAS syntax also requests that these be treated as random effects. Oddly (to me), the SAS syntax doesn't appear (to me) to specify fixed effects for period and cohort. Not sure that's standard practice.

I think that the NLOPTIONS statement refers to the maximization technique. Your syntax specified the same technique as the SAS default for binomial outcomes, so perhaps that could have been omitted, but I do not believe the same technique is implemented in Stata. Stata's default is Stata's modified Newton-Raphson algorithm. This probably won't cause your results to diverge, but I suppose it could happen. For what it's worth, the equivalent in SAS seems to be specifying TECHNIQUE = NEWRAP.

I'm not sure how to replicate the covariance test, which is a (Wald) test that the covariance parameters for each group (e.g. observations for each cohort) are identical. I'm going to assume all your variable names got converged to lowercase, and that obese has been recoded such that 1 indicates failure?

And finally, I'm also not clear on how SAS specifies its random effects structure. By my very naive read of the syntax, there appears not to be a specification for a random intercept. I'm not sure how that can be. Furthermore, are the effects of period and cohort crossed? I would assume so, but I don't know this.

I would check the examples for the -melogit- command; example 5 is parallel to your case with no individual random intercept, but crossed primary and secondary schools. In the example syntax, they treated primary schools, which are more numerous than secondary schools, as nested within all observations. Following that, I'm assuming that there are more periods than cohorts; if the reverse is true, then you may want to reverse the way the random effects are specified here. Anyway, this is what I believe the equivalent syntax looks like:

Code:

melogit obese c.age##c.age || _all: R.period || cohort:

This assumes that the random effects are independent. I am unclear how the SAS syntax treats the covariance structure of the random effects. Here, things are a bit simpler, because with only two random effects there can only be no covariance (syntax above), or one covariance. To model a covariance, you can type:

Code:

melogit obese c.age##c.age || _all: R.period || cohort:, covstructure(unstructured)

I should mention: I do not regularly program in SAS, I have never fit any sort of mixed effect model in SAS, and furthermore I'm not familiar with period-cohort type models with crossed random effects. So, caveat lector.

Last edited by Weiwen Ng; 09 Oct 2018, 12:56.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#3

09 Oct 2018, 16:41

I'm guessing that you get Stata's equivalent to the SAS's TECHNIQUE=NRRIDG (ridged Newton-Raphson) with the difficult option.

If you center the age variable (it seems that that's what was done in the SAS example: AGE_C) in order to stabilize the model when fitting the quadratic polynomial, then it might not be necessary.
2 likes
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#4

10 Oct 2018, 10:35

Thank you so much for all the replies.
- Joseph, yes, age is centered for the interpretation purpose. I actually just ran the model Weiwen suggested with difficult option since the model without it fails to converge with my data.
- Weiwen, I really appreciate your insight and detailed answers. The command you suggested is a very good starting point. I think I first need to play with SAS and Stata codes a bit more and see where I can find similarities and differences.

One question at this stage of analysis.
The part I don't really understand now is the command for period and cohort random effects.
The data is repeated cross-sectional. I am using a Korean version of NHANES, and the data for each year is cross-sectional. We have the data for each year between 2001 and 2015 (2001, 2005, and 2007-2015) and merged all them together to create one dataset for analysis. The unit of the data is individual.

So, individuals are definitely nested within period (years) and cohort (birth cohorts), but as far as I understand, cohorts and periods are independent. What APC model does is estimating the sheer effect of age, period and cohort, separately.

Below is the GLM equation by Yang C. Yang in her book on APC modeling. It contains additional covariates, such as sex, race, education, and income, at the individual level. The SAS command in the original post was without these covariates, so for writing Stata command, you can ignore them from the equation.

Based on the equation and the data structure, I wonder if the Stata command Weiwen suggested needs to be revised or not.
I will keep working on this, but I would appreciate any suggestions and thoughts!

Thank you.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

10 Oct 2018, 11:02

Originally posted by SeungYong Han View Post

...

So, individuals are definitely nested within period (years) and cohort (birth cohorts), but as far as I understand, cohorts and periods are independent. What APC model does is estimating the sheer effect of age, period and cohort, separately.

Below is the GLM equation by Yang C. Yang in her book on APC modeling. It contains additional covariates, such as sex, race, education, and income, at the individual level. The SAS command in the original post was without these covariates, so for writing Stata command, you can ignore them from the equation.

...

[ATTACH=CONFIG]n1465372[/ATTACH]

Seung Yong,

The equation on the page you provided does appear to treat cohorts and periods as crossed random effects. I based my suggested syntax on example 5, page 19, of the -melogit- manual, and I believe it should have a data structure similar to your case. In that example, there are 3,400+ unique respondents. Each one attended a primary school and a secondary school. The example doesn't have repeated measures on each individual. In that model, the effect of primary school is the same regardless of the secondary school (and vice versa). Furthermore, equation 4 in the example seems to parallel your scenario.

Thus, as far as I can tell, I think the random effects specification in that example is parallel to your intended random effects specification. Naturally, I could be wrong, and may I remind everyone that I don't typically fit this type of model. Hopefully someone can improve on my syntax.

Some additional clarification about the code:

Code:

c.age##c.age

The ## operator specifies an interaction. The c. prefix means treat this as a continuous variable - it's usually unnecessary, but here, if you don't enter it, I think the default with the interactino operator is to treat age as categorical. Anyway, the entire term above is equivalent to including age and age squared.

Code:

melogit obese c.age##c.age i.sex i.race i.education income || _all: R.period || cohort:

The || indicates the random effects part of the model. The R. prefix is the random effects equivalent of treating something as a factor or class variable. Otherwise, you should use the i. prefix to denote factor variables. Stata will choose the lowest value as the base category by default, but you can choose your own base category; type the following for help:

Code:

help fvvarlist

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment

Announcement

Proc Glimmix in SAS to Stata

Comment

Comment

Comment

Comment