Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to conduct clogit in Stata when cell values may be zero

    Dear All,

    I am working on data from a case control study of infants, where 1 diseased infant (case) is matched to 2 non-diseased infants (controls) on certain criteria. I am comparing risk factors for being diseased by comparing the presence of certain characteristics between cases and controls. I would like to first do a univariate analysis, and based on the results of these, select significant characteristics into a multivariable model.

    I am using the clogit function on Stata to do so. A table of my univariate analysis is below. There were a few more variables which I have not included, none of which were associated with being a case
    characteristic case (46) control (87) OR (95%CI) p-value
    HIV infected 5 - 0.03
    Breast milk infected 31 43 3.1 (1.1-8.6) 0.03
    mean IgG 204 205 1 (0.998 -1.0) 0.97
    mean log10 viral load 4.6 4.8 1.0 (0.5-2.3) 0.95
    As there were no controls that were HIV infected, the clogit function on HIV infection did not work, and instead I derived the p-value from McNemar's test (mcci 0 5 0 20).

    With the above data, HIV infection and breast milk infection were the only 2 characteristics that were associated with being diseased in the univariate analysis. I tried to include HIV infection and breast milk infection in a multivariable clogit for which I get the following output:
    cascon Odds Ratio Std. Err. z P>z [95% Conf. Interval]
    v1_bm_res 2.527388 2.077396 1.13 0.259 .5046887 - 12.65669
    hiv_pcr 3.52e+07 8.00e+10 0.01 0.994 0 - .

    I understand that HIV infection would not work in the model, because of 0 infections in the controls. Therefore, should I include HIV infection in the model? Is it correct to only quote the univariate odds ratio calculated for breast milk infection as being significantly associated with being a case whereas looking at the table, HIV infection and breast milk infection may be linked to being a case. How can I go about analysing this this matched case control data on Stata?

    I do hope my questions are clear and I am happy to provide more clarity. Thank you very much.

  • #2
    You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output (fixed spacing fonts help), and sample data using dataex. You switch between written variable names (HIV infected) and actual variable names (hiv_pcr) leaving the reader not absolutely sure these are the same things. I'm not sure what univariate analysis gave you the first table.

    It is seldom a good idea to look at univariate associations and use them to choose what to include in a multivariate model. An insignificant univariate association does not imply an insignificant association in multivariate model. And your model should include the variables your theory says should be included.

    I don't work in this space, so take my suggestions with caution. If HIV matters, and you don't have any HIV positive in the control, then the control is not really doing what you need. The second set of results are obviously wrong - the odds ratio is ridiculous. I suspect this means all 5 of your HIV infected had disease. If that is the case, I don't see how you can estimate the model in the way you want to. Stata gave you a lot more information in the clogit than you've giving us.

    Could you get new data on controls that included matching on HIV infection? If not, then I suspect all you can do is the univariate comparisons.

    Comment


    • #3
      Thank you very much Phil for your response. I will go through the FAQ again on asking questions in this forum and revise my question.
      Briefly in response to your comments, the univariate analysis was by univariate conditional logistic regression. I used a p-value of less than 0.2 to select covariates for the multiple conditional logistic regression models. Unfortunately I cannot reassign controls again as I initially matched cases and controls on HIV exposure (which is the mother's HIV infection status).

      However I will repost the question with all of the Stata code and output and the variable names and what they mean.


      Comment

      Working...
      X