Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • fitting the effect of a continuous var (vs. categorising it) in a gee model

    Hi All
    I am using a GEE model for panel data analysis where I have data for 6 waves and I am looking at a dependant var by waves. I first included waves as a continuous var and the output is as follows


    . xi:xtgee depvar wave , eform i(idauniq) fam(bin) link(logit) corr(exchangeable)


    Iteration 1: tolerance = .64951125
    Iteration 2: tolerance = .013703
    Iteration 3: tolerance = .00064495
    Iteration 4: tolerance = .00003776
    Iteration 5: tolerance = 2.180e-06
    Iteration 6: tolerance = 1.338e-07

    GEE population-averaged model Number of obs = 57126
    Group variable: idauniq Number of groups = 15783
    Link: logit Obs per group: min = 1
    Family: binomial avg = 3.6
    Correlation: exchangeable max = 6
    Wald chi2(5) = 3965.99
    Scale parameter: 1 Prob > chi2 = 0.0000

    -------------------------------------------------------------------------------
    depvar | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
    --------------+----------------------------------------------------------------
    wave | 1.212814 .0044039 53.14 0.000 1.204214 1.221477
    _cons | .6416439 .0204127 -13.95 0.000 .6028574 .6829258
    -------------------------------------------------------------------------------



    and then used it as a categorical variable with the following output

    . xi:xtgee depvar i.wave , eform i(idauniq) fam(bin) link(logit) corr(exchangeable)
    i.wave _Iwave_1-6 (naturally coded; _Iwave_1 omitted)

    Iteration 1: tolerance = .55485767
    Iteration 2: tolerance = .01454043
    Iteration 3: tolerance = .00080836
    Iteration 4: tolerance = .0000476
    Iteration 5: tolerance = 2.803e-06
    Iteration 6: tolerance = 1.711e-07

    GEE population-averaged model Number of obs = 57126
    Group variable: idauniq Number of groups = 15783
    Link: logit Obs per group: min = 1
    Family: binomial avg = 3.6
    Correlation: exchangeable max = 6
    Wald chi2(9) = 4238.64
    Scale parameter: 1 Prob > chi2 = 0.0000

    -------------------------------------------------------------------------------
    depvar | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
    --------------+----------------------------------------------------------------
    _Iwave_2 | 1.002251 .0199846 0.11 0.910 .9638378 1.042196
    _Iwave_3 | 1.698867 .0335438 26.84 0.000 1.634378 1.7659
    _Iwave_4 | 1.799413 .0354028 29.86 0.000 1.731346 1.870156
    _Iwave_5 | 2.217938 .0444731 39.73 0.000 2.132463 2.306839
    _Iwave_6 | 2.42179 .0493268 43.43 0.000 2.327016 2.520424
    _cons | .7771827 .0243593 -8.04 0.000 .7308763 .8264229
    -------------------------------------------------------------------------------


    I want to compare the two models to see whether it is better to fit 'wave' as a continuous or categorical variable. I know that one could use likelihood ratio test to see whether it should be fitted as a continuous variable or categorical one but you cant use LRT here within a GEE model. I am trying to use testparm command to check which model is better but not sure about the syntax and interpretation. Any help will be greatly appreciated.

    Thanks,
    Nafeesa
    Last edited by Nafeesa Dhalwani; 03 Mar 2015, 05:59.

  • #2
    With 57,126 observations on 15,783 groups, I would be hesitant to rely on p-value based criteria to choose a model, even if a likelihood-ratio test were possible. In so large a sample, even tiny, meaningless differences in model fit will show up as "statistically significant." I would probably go a different route: I would do something like a Hosmer-Lemeshow calibration analysis: divide the data into deciles (or, in a data set this size, perhaps vingtiles) of predicted probability and then graphically compare the predicted and observed number of successes in each decile. (I would not do a chi square test from this.) I would do this for each model and then make a visual judgment whether the discrete wave model is a substantially better fit.

    As an aside, if you are using current Stata (you're supposed to tell us if you're not), then you should no longer be using -xi-. It has been superseded by factor variable notation (-help fvvarlist-) in almost all estimation commands, and it works brilliantly with -margins-. (Factor variables are also available in Stata 12.)


    Comment


    • #3
      I agree with the points Clyde made. But, in general, if you want/need to do a Wald test of categorical vs continuous, I think you can do something like

      Code:
      webuse nhanes2f, clear
      logit diabete c.health o(1 2).health
      testparm i.health
      Basically, you include categorical and continuous versions of the same variable and then see whether the less restrictive categorical gains you anything.Note that you have to drop 2 categories of the categorical variable to avoid perfect multicollinearity.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Thank you Clyde and Richard for your response. I will try the Hosmer-Lemeshow calibration analysis. But for clarification purposes Richard when you run the code above in your example you get a p-value of 0.66. So am I right in assuming that in this example the categorical health variable offers no better fit than the continuous health variable and so it could be used as a continuous one?

        Comment


        • #5
          I would also look at the p values for the dummies and see if they are insignificant. In my example it seems reasonable to treat health as continuous.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Also, if you run

            Code:
            logit diabete i.health
            the pattern of coefficients for the dummies looks very close to a linear relationship.

            I think you can also do something like

            Code:
            webuse nhanes2f, clear
            logit diabetes i.health
            test 3.health = 2 * 2.health
            test 4.health = 3 * 2.health, accum
            test 5.health = 4 * 2.health, accum
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Thanks for This Richard. Is there a way to do a test for trend within a GEE model?

              Comment

              Working...
              X