Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standard Errors and Model Statistics Unreported with XTGEE and MI

    All,

    I have a model estimation issue the cause of which I can't determine and therefore can't remedy. I am running GEE logistic models with and without multiple imputation at the behest of a reviewer on a revise and resubmit. He insisted even though there is an N of 2 for the cluster variable. The following command runs fine when multiple imputation is not used:

    Code:
    xtgee linkagein14days i.(arm block gender_id1 latinx employment homeless insurance_any ///
                coip appointmentwithin48hours) c.(age withdrawal) appointmentwithin48hours#arm, family(binomial) ///
                link(logit) ef vce(robust) nmp
    As shown, I am requesting robust standard errors and the nmp correction for the small number of clusters. All seems well until I get to the multiple imputation phase. The N is 258 prior to imputation and 274 after. So there is not that much missing data. But to keep the reviewer and editor happy, we went with it. Using MICE and 100 imputations (overkill I suppose but the "how_many_imputations" user add-on that I have relied on in the past seems not to work with Stata V18), all missing data are successfully imputed. The xtgee command run after running MICE is almost identical to the prior command but with the MI additions:

    Code:
    mi estimate, eform: xtgee linkagein14days i.(arm block gender_id1 latinx employment insurance_any homeless ///
                coip appointmentwithin48hours) c.(age withdrawal) appointmentwithin48hours#arm, family(binomial) ///
                link(logit) vce(robust) nmp
    A portion of the output returned looks like this:
    Click image for larger version

Name:	Screenshot 2024-03-09 at 11.35.25 AM.png
Views:	1
Size:	58.6 KB
ID:	1746058


    If I remove the vce(robust) option, the model runs and produces all of the above missing estimates. However, I can also get all estimates if I remove one of the predictors (insurance status) and leave the vce(robust) option in. It gets weirder because if I add an additional predictor that was not in the model, homelessness, and keep insurance status and the vce(robust) option, it also works and produces estimates. The variable insurance status, has only ten missing cases to be imputed and had a distribution of 10% yes, 90% no. I did not think that would cause a problem and maybe it isn't the problem given this confusing pattern of results. I do note that with robust standard errors more of the predictors are significant and in a way that makes substantive sense. Without the robust standard errors, fewer predictors are significant. So my preference is to use the model reporting the robust standard errors and associated significance tests but I don't want to misrepresent the findings if this kind of error points to model misspecification or something I am doing wrong with MICE.

    Any thoughts on debugging/troubleshooting or what might be causing this issue would be appreciated. Thanks in advance.
    Attached Files

  • #2
    Originally posted by James Swartz View Post
    I am running GEE logistic models . . . even though there is an N of 2 for the cluster variable.
    And neither referee said anything about that?

    You're already skating on thin ice trying to fit a model with at least 13 predictors against a total of 273 observations (unknown how many in the lessor of successes or failures), but given the trouble you're having how about including your cluster variable as a predictor in a conventional GLM, logit linkagein14days . . . i.sitelocationalhp2?

    Comment


    • #3
      Originally posted by Joseph Coveney View Post
      And neither referee said anything about that?

      You're already skating on thin ice trying to fit a model with at least 13 predictors against a total of 273 observations (unknown how many in the lessor of successes or failures), but given the trouble you're having how about including your cluster variable as a predictor in a conventional GLM, logit linkagein14days . . . i.sitelocationalhp2?

      You're preaching to the choir. We originally ran these models with site as a fixed effect (as it still should be in my opinion and it looks like in yours as well). Here is the critique received: Study site appears to have been treated as a fixed effect in the analysis. This restricts any conclusions from the trial to the two specific sites used, and the results are consequently of no interest or value to those outside these two sites. Instead, we need to recognize that these two sites are just 2 of many that might have been involved in this study across many different locations. Hence, study site should be regarded as a random effect to allow the results to apply more widely. Only this approach would be of interest to the wider readership of the journal.

      I don't quite see that myself. This editor also wanted us to not only include site as a random intercept but to also include another variable as a random slope. We did our best at first using melogit, which ran and converged with the random intercept term - there was hardly any variation attributable to site. But the model including the random slope did not converge. So we went with GEE as a way to produce population averaged estimates that would address this reviewer's concerns but pushed back on including a random slope as being non-estimable. The two approaches (fixed effect versus clustering variable in GEE) produce fairly similar results. I'm not happy having to run the models this way but as I indicated, it's a very good journal and worth this degree of pain, I suppose.

      Comment


      • #4
        Originally posted by James Swartz View Post
        . . . there was hardly any variation attributable to site.
        You might want to consider pushing back on this basis if nothing else. State your case for doing what you think is proper, and if it's such a good journal, then the editor will step in as arbitrator between you and the referee.

        And because the site doesn't really explain much if any variation in the outcome anyway, consider pooling between sites, omitting that variable from the model altogether, especially if you're already pushing up against the limit of predictors your model can accommodate given the number of observations (lesser, sorry).

        Comment

        Working...
        X