melogit (stata) versus genlinmixed (spss) for multilevel (mixed) logistic regression

Hansie18

Join Date: Sep 2014

Posts: 11
#1

melogit (stata) versus genlinmixed (spss) for multilevel (mixed) logistic regression

25 Sep 2014, 05:48

Hi, I previously posted this, but was asked to provide nicer format information. This is now included in the attachments.

My question is why Stata and Spss give so different estimates of the odds ratios and the random effect variance.

Is this due to Spss using PQL and Stata using a more sophisticated method? And might this also be related to the small number of level 1 units within level 2 units.

For those who want to know this is about age and education predicting health (there are 1701 individuals from 1479 households).

In Stata I tried several alternative options (as I did in Spss), but these do not matter so much, e.g. robust or not, mean/variance or mode/curvature gauss-hermite, integration 7 or 50. At least, that is what I tested.

Thanks a lot for any advice.

Hans
Tags: None
Hansie18

Join Date: Sep 2014

Posts: 11
#2

25 Sep 2014, 05:52

attachment 1
Attached Files

genlinmixed.docx (81.3 KB, 1 view)
Comment
Hansie18

Join Date: Sep 2014

Posts: 11
#3

25 Sep 2014, 05:55

attachment 2
Attached Files

melogit.docx (14.3 KB, 1 view)
Comment
skolenik

Join Date: Mar 2014

Posts: 100
#4

25 Sep 2014, 08:03

Hansie, can you please repost your output so that it is visible as is, rather than as doc files?

-- Stas Kolenikov || http://stas.kolenikov.name
-- Principal Survey Scientist, Abt SRBI
-- Opinions stated in this post are mine only
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#5

25 Sep 2014, 08:57

In your original post, you were asked to specifically show your posts using code blocks, and given exact instructions how to do that:

Finally, to make your commands and output align in an easily readable way, please post them as code blocks. To do that, click on the underlined A button to launch the advanced editor features. Then click on the # button. Two code block delimiters will appear. Paste your commands and output between those delimiters.

Please follow those instructions for posting your output. Some readers on this forum do not use Microsoft office. Even those who do may be reluctant to open attachments that might contain active content posted by a stranger. Since your question can only be answered by the small subset of forum participants who know about SPSS and mixed-effects logistic regression, a small enough set to start with, don't needlessly restrict the set of possible responders further!
1 like
Comment

Hansie18

Join Date: Sep 2014
Posts: 11

25 Sep 2014, 09:10

Code:

. melogit health2013 ib(first).leeftijdscat_4 oplmet || nohouse_encr:, vce(robust) or
 
Fitting fixed-effects model:
 
Iteration 0:   log likelihood = -783.74534 
Iteration 1:   log likelihood = -781.24513 
Iteration 2:   log likelihood = -781.23318 
Iteration 3:   log likelihood = -781.23318 
 
Refining starting values:
 
Grid node 0:   log likelihood = -788.02009
 
Fitting full model:
 
Iteration 0:   log pseudolikelihood = -788.02009  (not concave)
Iteration 1:   log pseudolikelihood = -781.49779 
Iteration 2:   log pseudolikelihood = -780.71491 
Iteration 3:   log pseudolikelihood = -780.67241 
Iteration 4:   log pseudolikelihood = -780.67238 
 
Mixed-effects logistic regression               Number of obs      =      1701
Group variable:    nohouse_encr                 Number of groups   =      1479
 
                                                Obs per group: min =         1
                                                               avg =       1.2
                                                               max =         4
 
Integration method: mvaghermite                 Integration points =         7
 
                                                Wald chi2(4)       =     31.77
Log pseudolikelihood = -780.67238               Prob > chi2        =    0.0000
                               (Std. Err. adjusted for clustering on nohouse_encr)
----------------------------------------------------------------------------------
                 |               Robust
      health2013 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
                 |
  leeftijdscat_4 |
              2  |   1.315458   .4699117     0.77   0.443     .6531427    2.649388
              3  |   2.846147   .9623088     3.09   0.002     1.467095    5.521493
              4  |   3.065375   1.048153     3.28   0.001     1.568302    5.991528
                 |
          oplmet |   .8663129   .0445488    -2.79   0.005     .7832548    .9581786
           _cons |   .1337556   .0501185    -5.37   0.000     .0641749    .2787781
-----------------+----------------------------------------------------------------
nohouse_encr     |
       var(_cons)|   .5396133   .5581699                      .0710574    4.097849
----------------------------------------------------------------------------------

Comment

Hansie18

Join Date: Sep 2014
Posts: 11

25 Sep 2014, 09:15

Code:

SPSS output

variable level health2013 leeftijdscat_4 (nominal).
variable level oplmet (scale).
 
GENLINMIXED
  /DATA_STRUCTURE SUBJECTS=nohouse_encr
  /FIELDS TARGET=health2013 TRIALS=NONE OFFSET=NONE
  /TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
  /FIXED  EFFECTS=oplmet leeftijdscat_4 USE_INTERCEPT=TRUE
  /RANDOM USE_INTERCEPT=TRUE SUBJECTS=nohouse_encr COVARIANCE_TYPE=UNSTRUCTURED
  /BUILD_OPTIONS TARGET_CATEGORY_ORDER=DESCENDING          INPUTS_CATEGORY_ORDER=DESCENDING MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95     DF_METHOD=SATTERTHWAITE COVB=ROBUST
  /EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD.
 
Warnings glmm: One or more records are not used in the analysis because they have one or more fields with invalid or missing values.

glmm: Valid values for events (target) and trials variables are non-negative and positive integers respectively, and the number of trials cannot be less than the number of events.


    Case Processing Summary


N
Percent

Included
1701
87,1%

Excluded
253
12,9%

Total
1954
100,0%

Comment

Hansie18

Join Date: Sep 2014
Posts: 11

25 Sep 2014, 09:16

Code:

SPSS output

variable level health2013 leeftijdscat_4 (nominal).
variable level oplmet (scale).
 
GENLINMIXED
  /DATA_STRUCTURE SUBJECTS=nohouse_encr
  /FIELDS TARGET=health2013 TRIALS=NONE OFFSET=NONE
  /TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
  /FIXED  EFFECTS=oplmet leeftijdscat_4 USE_INTERCEPT=TRUE
  /RANDOM USE_INTERCEPT=TRUE SUBJECTS=nohouse_encr COVARIANCE_TYPE=UNSTRUCTURED
  /BUILD_OPTIONS TARGET_CATEGORY_ORDER=DESCENDING          INPUTS_CATEGORY_ORDER=DESCENDING MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95     DF_METHOD=SATTERTHWAITE COVB=ROBUST
  /EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD.
 
Warnings glmm: One or more records are not used in the analysis because they have one or more fields with invalid or missing values.

glmm: Valid values for events (target) and trials variables are non-negative and positive integers respectively, and the number of trials cannot be less than the number of events.


    Case Processing Summary


N
Percent

Included
1701
87,1%

Excluded
253
12,9%

Total
1954
100,0%

Attached Files

Comment

Hansie18

Join Date: Sep 2014

Posts: 11
#9

25 Sep 2014, 09:19

Sorry about the mess, but I hope it is clear now.
Hans
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 686
#10

26 Sep 2014, 10:31

Your melogit output is showing the Odds Ratios, so we have to compute logs to compare
the fixed coefficients; nonetheless, they are clearly different.

melogit is reporting the RE variance estimate of .5396133, while the SPSS run is showing 0.113.

Try supplying the SPSS point estimates as starting values to melogit to see if
it stays at those estimates or iterates back to the original melogit results above.
Compare the log-likelihood value at these new starting values to the one reported by
melogit above.

The call to melogit would be

Code:

melogit health2013 ib(first).leeftijdscat_4 oplmet || nohouse_encr:, /// from( _cons=-1.845 /// oplmet=-0.133 /// 4.leeftijdscat_4=1.053 /// 3.leeftijdscat_4=0.987 /// 2.leeftijdscat_4=0.265 /// /var(_cons[nohouse_encr])=0.113)
Comment
Hansie18

Join Date: Sep 2014

Posts: 11
#11

26 Sep 2014, 14:21

Originally posted by Jeff Pitblado (StataCorp) View Post

Your melogit output is showing the Odds Ratios, so we have to compute logs to compare
the fixed coefficients; nonetheless, they are clearly different.

melogit is reporting the RE variance estimate of .5396133, while the SPSS run is showing 0.113.

Try supplying the SPSS point estimates as starting values to melogit to see if
it stays at those estimates or iterates back to the original melogit results above.
Compare the log-likelihood value at these new starting values to the one reported by
melogit above.

The call to melogit would be

Code:

melogit health2013 ib(first).leeftijdscat_4 oplmet || nohouse_encr:, /// from( _cons=-1.845 /// oplmet=-0.133 /// 4.leeftijdscat_4=1.053 /// 3.leeftijdscat_4=0.987 /// 2.leeftijdscat_4=0.265 /// /var(_cons[nohouse_encr])=0.113)

Thanks for the tip Jeff.,

Using the spss starting values, it still iterates back to the original estimates. Would it be possible that the small number of people within clusters (most households have only 1 respondent) interferes somehow and that perhaps I should just use ordinary/simple logistic regression. ICC and design effect both are small. Not sure though.
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#12

26 Sep 2014, 16:27

Would it be possible that the small number of people within clusters (most households have only 1 respondent) interferes somehow

Bingo. Also, the warnings SPSS threw should give you serious concerns about whether your data is appropriate for your model. 1701 cases with 1479 groups is pretty darn iffy. I'm surprised Stata didn't complain. It is possible to combine between and within models when most cases have no variance within, and am assuming that Stata has a better way of combining them, but that's just me trusting Stata. Given the nature of your data, iffy to use mixed effects. What I find especially interesting is that Stata used the same 1701 cases that SPSS did, throwing out 253 of them with no warning.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#13

26 Sep 2014, 16:38

Well, you don't have much repeated observation within households to base an estimate of ICC on. Although it would cost you the loss of some of your data, you might consider, rather than just going to ordinary regression with the full data set, selecting one observation per household and going with that. The selection might be systematic (the designated "head" if there is one, or the oldest, or something like that), or if no particular systematic approach seems sensible, selecting one household member at random.

Actually, I might do both the full data set and a 1 observation per household subset as confirmation that deliberately ignoring of the small amount of nesting going on isn't distorting things too much.
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#14

26 Sep 2014, 17:31

Actually, upon re-reading the SPSS warnings, you have an additional problem: missing or invalid data. So figure out what cases have missing or invalid data (and why, and what to do about it), and try your models again. A mixed-model approach is iffy given the number of singletons, but "missing or invalid data" deserves scrutiny even if you do fall back on ignoring clusters or sampling like Clyde suggested.

Huh. Usually, I find Stata warnings/errors more informative than SPSS ones. But in this circumstance, SPSS was more informative.
Comment
Hansie18

Join Date: Sep 2014

Posts: 11
#15

26 Sep 2014, 21:34

Thank you for the tips, Ben Earnhart, well I gave Stata the data without missings to begin with, while SPSS recognised the user-defined missings itself, so this is not that problematic for the difference between the programs and their outputs. Clyde Schechter, I have thought about the option of (randomly) selecting one per household too and I will do that again now. If the (single-level) findings do not differ much between both N's, I could choose one in the paper and tell readers that findings were similar with the other N. This may indeed be the way to go. Thanks again for helping out.
Comment

Announcement

melogit (stata) versus genlinmixed (spss) for multilevel (mixed) logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment