Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • melogit (stata) versus genlinmixed (spss) for multilevel (mixed) logistic regression

    Hi, I previously posted this, but was asked to provide nicer format information. This is now included in the attachments.

    My question is why Stata and Spss give so different estimates of the odds ratios and the random effect variance.

    Is this due to Spss using PQL and Stata using a more sophisticated method? And might this also be related to the small number of level 1 units within level 2 units.

    For those who want to know this is about age and education predicting health (there are 1701 individuals from 1479 households).

    In Stata I tried several alternative options (as I did in Spss), but these do not matter so much, e.g. robust or not, mean/variance or mode/curvature gauss-hermite, integration 7 or 50. At least, that is what I tested.

    Thanks a lot for any advice.

    Hans

  • #2
    attachment 1
    Attached Files

    Comment


    • #3
      attachment 2
      Attached Files

      Comment


      • #4
        Hansie, can you please repost your output so that it is visible as is, rather than as doc files?
        -- Stas Kolenikov || http://stas.kolenikov.name
        -- Principal Survey Scientist, Abt SRBI
        -- Opinions stated in this post are mine only

        Comment


        • #5
          In your original post, you were asked to specifically show your posts using code blocks, and given exact instructions how to do that:

          Finally, to make your commands and output align in an easily readable way, please post them as code blocks. To do that, click on the underlined A button to launch the advanced editor features. Then click on the # button. Two code block delimiters will appear. Paste your commands and output between those delimiters.
          Please follow those instructions for posting your output. Some readers on this forum do not use Microsoft office. Even those who do may be reluctant to open attachments that might contain active content posted by a stranger. Since your question can only be answered by the small subset of forum participants who know about SPSS and mixed-effects logistic regression, a small enough set to start with, don't needlessly restrict the set of possible responders further!

          Comment


          • #6
            Code:
            . melogit health2013 ib(first).leeftijdscat_4 oplmet || nohouse_encr:, vce(robust) or
             
            Fitting fixed-effects model:
             
            Iteration 0:   log likelihood = -783.74534 
            Iteration 1:   log likelihood = -781.24513 
            Iteration 2:   log likelihood = -781.23318 
            Iteration 3:   log likelihood = -781.23318 
             
            Refining starting values:
             
            Grid node 0:   log likelihood = -788.02009
             
            Fitting full model:
             
            Iteration 0:   log pseudolikelihood = -788.02009  (not concave)
            Iteration 1:   log pseudolikelihood = -781.49779 
            Iteration 2:   log pseudolikelihood = -780.71491 
            Iteration 3:   log pseudolikelihood = -780.67241 
            Iteration 4:   log pseudolikelihood = -780.67238 
             
            Mixed-effects logistic regression               Number of obs      =      1701
            Group variable:    nohouse_encr                 Number of groups   =      1479
             
                                                            Obs per group: min =         1
                                                                           avg =       1.2
                                                                           max =         4
             
            Integration method: mvaghermite                 Integration points =         7
             
                                                            Wald chi2(4)       =     31.77
            Log pseudolikelihood = -780.67238               Prob > chi2        =    0.0000
                                           (Std. Err. adjusted for clustering on nohouse_encr)
            ----------------------------------------------------------------------------------
                             |               Robust
                  health2013 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -----------------+----------------------------------------------------------------
                             |
              leeftijdscat_4 |
                          2  |   1.315458   .4699117     0.77   0.443     .6531427    2.649388
                          3  |   2.846147   .9623088     3.09   0.002     1.467095    5.521493
                          4  |   3.065375   1.048153     3.28   0.001     1.568302    5.991528
                             |
                      oplmet |   .8663129   .0445488    -2.79   0.005     .7832548    .9581786
                       _cons |   .1337556   .0501185    -5.37   0.000     .0641749    .2787781
            -----------------+----------------------------------------------------------------
            nohouse_encr     |
                   var(_cons)|   .5396133   .5581699                      .0710574    4.097849
            ----------------------------------------------------------------------------------

            Comment


            • #7
              Code:
              SPSS output
              
              variable level health2013 leeftijdscat_4 (nominal).
              variable level oplmet (scale).
               
              GENLINMIXED
                /DATA_STRUCTURE SUBJECTS=nohouse_encr
                /FIELDS TARGET=health2013 TRIALS=NONE OFFSET=NONE
                /TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
                /FIXED  EFFECTS=oplmet leeftijdscat_4 USE_INTERCEPT=TRUE
                /RANDOM USE_INTERCEPT=TRUE SUBJECTS=nohouse_encr COVARIANCE_TYPE=UNSTRUCTURED
                /BUILD_OPTIONS TARGET_CATEGORY_ORDER=DESCENDING          INPUTS_CATEGORY_ORDER=DESCENDING MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95     DF_METHOD=SATTERTHWAITE COVB=ROBUST
                /EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD.
               
              Warnings 
              glmm: One or more records are not used in the analysis because they have one or more fields with invalid or missing values.
              glmm: Valid values for events (target) and trials variables are non-negative and positive integers respectively, and the number of trials cannot be less than the number of events.
              Case Processing Summary
              N Percent
              Included 1701 87,1%
              Excluded 253 12,9%
              Total 1954 100,0%




              Comment


              • #8
                Code:
                SPSS output
                
                variable level health2013 leeftijdscat_4 (nominal).
                variable level oplmet (scale).
                 
                GENLINMIXED
                  /DATA_STRUCTURE SUBJECTS=nohouse_encr
                  /FIELDS TARGET=health2013 TRIALS=NONE OFFSET=NONE
                  /TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
                  /FIXED  EFFECTS=oplmet leeftijdscat_4 USE_INTERCEPT=TRUE
                  /RANDOM USE_INTERCEPT=TRUE SUBJECTS=nohouse_encr COVARIANCE_TYPE=UNSTRUCTURED
                  /BUILD_OPTIONS TARGET_CATEGORY_ORDER=DESCENDING          INPUTS_CATEGORY_ORDER=DESCENDING MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95     DF_METHOD=SATTERTHWAITE COVB=ROBUST
                  /EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD.
                 
                Warnings 
                glmm: One or more records are not used in the analysis because they have one or more fields with invalid or missing values.
                glmm: Valid values for events (target) and trials variables are non-negative and positive integers respectively, and the number of trials cannot be less than the number of events.
                Case Processing Summary
                N Percent
                Included 1701 87,1%
                Excluded 253 12,9%
                Total 1954 100,0%
                Attached Files

                Comment


                • #9
                  Sorry about the mess, but I hope it is clear now.
                  Hans

                  Comment


                  • #10
                    Your melogit output is showing the Odds Ratios, so we have to compute logs to compare
                    the fixed coefficients; nonetheless, they are clearly different.

                    melogit is reporting the RE variance estimate of .5396133, while the SPSS run is showing 0.113.

                    Try supplying the SPSS point estimates as starting values to melogit to see if
                    it stays at those estimates or iterates back to the original melogit results above.
                    Compare the log-likelihood value at these new starting values to the one reported by
                    melogit above.

                    The call to melogit would be
                    Code:
                    melogit health2013 ib(first).leeftijdscat_4 oplmet || nohouse_encr:, ///
                            from(   _cons=-1.845                    ///
                                    oplmet=-0.133                   ///
                                    4.leeftijdscat_4=1.053          ///
                                    3.leeftijdscat_4=0.987          ///
                                    2.leeftijdscat_4=0.265          ///
                                    /var(_cons[nohouse_encr])=0.113)

                    Comment


                    • #11
                      Originally posted by Jeff Pitblado (StataCorp) View Post
                      Your melogit output is showing the Odds Ratios, so we have to compute logs to compare
                      the fixed coefficients; nonetheless, they are clearly different.

                      melogit is reporting the RE variance estimate of .5396133, while the SPSS run is showing 0.113.

                      Try supplying the SPSS point estimates as starting values to melogit to see if
                      it stays at those estimates or iterates back to the original melogit results above.
                      Compare the log-likelihood value at these new starting values to the one reported by
                      melogit above.

                      The call to melogit would be
                      Code:
                      melogit health2013 ib(first).leeftijdscat_4 oplmet || nohouse_encr:, ///
                      from( _cons=-1.845 ///
                      oplmet=-0.133 ///
                      4.leeftijdscat_4=1.053 ///
                      3.leeftijdscat_4=0.987 ///
                      2.leeftijdscat_4=0.265 ///
                      /var(_cons[nohouse_encr])=0.113)
                      Thanks for the tip Jeff.,

                      Using the spss starting values, it still iterates back to the original estimates. Would it be possible that the small number of people within clusters (most households have only 1 respondent) interferes somehow and that perhaps I should just use ordinary/simple logistic regression. ICC and design effect both are small. Not sure though.

                      Comment


                      • #12
                        Would it be possible that the small number of people within clusters (most households have only 1 respondent) interferes somehow
                        Bingo. Also, the warnings SPSS threw should give you serious concerns about whether your data is appropriate for your model. 1701 cases with 1479 groups is pretty darn iffy. I'm surprised Stata didn't complain. It is possible to combine between and within models when most cases have no variance within, and am assuming that Stata has a better way of combining them, but that's just me trusting Stata. Given the nature of your data, iffy to use mixed effects. What I find especially interesting is that Stata used the same 1701 cases that SPSS did, throwing out 253 of them with no warning.

                        Comment


                        • #13
                          Well, you don't have much repeated observation within households to base an estimate of ICC on. Although it would cost you the loss of some of your data, you might consider, rather than just going to ordinary regression with the full data set, selecting one observation per household and going with that. The selection might be systematic (the designated "head" if there is one, or the oldest, or something like that), or if no particular systematic approach seems sensible, selecting one household member at random.

                          Actually, I might do both the full data set and a 1 observation per household subset as confirmation that deliberately ignoring of the small amount of nesting going on isn't distorting things too much.

                          Comment


                          • #14
                            Actually, upon re-reading the SPSS warnings, you have an additional problem: missing or invalid data. So figure out what cases have missing or invalid data (and why, and what to do about it), and try your models again. A mixed-model approach is iffy given the number of singletons, but "missing or invalid data" deserves scrutiny even if you do fall back on ignoring clusters or sampling like Clyde suggested.

                            Huh. Usually, I find Stata warnings/errors more informative than SPSS ones. But in this circumstance, SPSS was more informative.

                            Comment


                            • #15
                              Thank you for the tips, Ben Earnhart, well I gave Stata the data without missings to begin with, while SPSS recognised the user-defined missings itself, so this is not that problematic for the difference between the programs and their outputs. Clyde Schechter, I have thought about the option of (randomly) selecting one per household too and I will do that again now. If the (single-level) findings do not differ much between both N's, I could choose one in the paper and tell readers that findings were similar with the other N. This may indeed be the way to go. Thanks again for helping out.

                              Comment

                              Working...
                              X