Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Test for clustering

    In analysing data from a cohort of 6500 children recruited from 3500 families, I would like to examine if the event (developing typhoid) is clustered within households. While multiple events per child is possible, the data is set up as one record per child and only one event per child is included. If clustering within households is present, I will probably use a GEE for analysis else a logistic regression.
    Ideally I would like a way of quantifying ICC when the outcome is number of events.
    Could someone please provide me an indication of where to look for answers.
    Thanks!

    Jacob

  • #2
    Originally posted by Jacob John View Post
    . . . only one event per child is included.
    . . . Ideally I would like a way of quantifying ICC when the outcome is number of events.
    The outcome, I think, would not be the number of events, but rather the proportion of children with one or more episodes (relapse or re-infection) of typhoid fever, wouldn't it? Anyway, the output below illustrates how to get intraclass correlation coefficient from the setup that you describe.

    . . . I will probably use a GEE for analysis else a logistic regression.
    I show both below so that you can get an idea of the potential for differences in the magnitude of the household-specific and population-average estimates.

    .ÿ
    .ÿversionÿ16.0

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿsetÿseedÿ`=strreverse("1523998")'

    .ÿquietlyÿsetÿobsÿ3500

    .ÿgenerateÿintÿfidÿ=ÿ_n

    .ÿgenerateÿdoubleÿfid_uÿ=ÿrnormal()

    .ÿ
    .ÿgenerateÿbyteÿepdÿ=ÿ1ÿ+ÿrpoisson(6500ÿ/ÿ3500ÿ-ÿ1)

    .ÿsummarizeÿepd,ÿmeanonly

    .ÿifÿr(sum)ÿ<ÿ6500ÿquietlyÿreplaceÿepdÿ=ÿepdÿ+ÿ1ÿinÿ1/`=6500ÿ-ÿr(sum)'

    .ÿelseÿ{
    .ÿÿÿÿÿÿÿÿÿgsortÿ-epd
    .ÿÿÿÿÿÿÿÿÿquietlyÿreplaceÿepdÿ=ÿepdÿ-ÿ1ÿinÿ`1/r(sum)ÿ-ÿ6500'
    .ÿÿÿÿÿÿÿÿÿassertÿepd
    .ÿ}

    .ÿquietlyÿexpandÿepd

    .ÿ
    .ÿ//ÿca.ÿ5%ÿchanceÿchildrenÿdevelopingÿoneÿorÿmoreÿepisodesÿtyphoidÿfever
    .ÿgenerateÿdoubleÿxbÿ=ÿfid_uÿ+ÿlogit(0.05)

    .ÿgenerateÿbyteÿtphÿ=ÿrbinomial(1,ÿinvlogit(xb))

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿmelogitÿtphÿ||ÿfid:ÿ,ÿnolog

    Mixed-effectsÿlogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ6,500
    Groupÿvariable:ÿÿÿÿÿÿÿÿÿÿÿÿÿfidÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿ3,500

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ1
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ1.9
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ7

    Integrationÿmethod:ÿmvaghermiteÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿIntegrationÿpts.ÿÿ=ÿÿÿÿÿÿÿÿÿÿ7

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(0)ÿÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿÿ.
    Logÿlikelihoodÿ=ÿ-1654.9072ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿÿ.
    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿtphÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-3.007752ÿÿÿ.1186507ÿÿÿ-25.35ÿÿÿ0.000ÿÿÿÿ-3.240303ÿÿÿ-2.775201
    -------------+----------------------------------------------------------------
    fidÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿvar(_cons)|ÿÿÿÿ1.09515ÿÿÿ.2886757ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.6532831ÿÿÿÿ1.835886
    ------------------------------------------------------------------------------
    LRÿtestÿvs.ÿlogisticÿmodel:ÿchibar2(01)ÿ=ÿ23.59ÿÿÿÿÿÿÿProbÿ>=ÿchibar2ÿ=ÿ0.0000

    .ÿestatÿicc

    Intraclassÿcorrelation

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLevelÿ|ÿÿÿÿÿÿÿÿICCÿÿÿStd.ÿErr.ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -----------------------------+------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿfidÿ|ÿÿÿ.2497481ÿÿÿ.0493908ÿÿÿÿÿÿ.1656754ÿÿÿÿÿ.358169
    ------------------------------------------------------------------------------

    .ÿ
    .ÿxtgeeÿtph,ÿi(fid)ÿfamily(binomial)ÿlink(logit)ÿcorr(exchangeable)ÿnolog

    GEEÿpopulation-averagedÿmodelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ6,500
    Groupÿvariable:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿfidÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿ3,500
    Link:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿlogitÿÿÿÿÿÿObsÿperÿgroup:
    Family:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿbinomialÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ1
    Correlation:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿexchangeableÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ1.9
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ7
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(0)ÿÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿÿ.
    Scaleÿparameter:ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿÿ.

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿtphÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-2.571459ÿÿÿ.0509405ÿÿÿ-50.48ÿÿÿ0.000ÿÿÿÿ-2.671301ÿÿÿ-2.471618
    ------------------------------------------------------------------------------

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      I wrote
      Code:
      quietly replace epd = epd - 1 in `1/r(sum) - 6500'
      Good thing that it never got there.

      Replace that with
      Code:
      quietly replace epd = epd - 1 in 1/`=r(sum) - 6500'

      Comment


      • #4
        Dear Joseph,
        Thank you! Very useful and enlightening. I was travelling to an area without internet and did not have a chance to review this. You are right, I did not mean number of events but the proportion positive.
        My choice had to be logistic vs mixed effects logistic vs GEE.
        Since in my data the ICC is

        famid | .5262349 .0498382 (.4287955 .6217163)

        I would assume a simple logistic regression is not the option and either melogit or GEE would be appropriate. Any suggestions which would make more sense? I understand that Mixed effects assess risk at the individual level and GEE at the population level - but what is the meaningful difference if my goal is to predict incidence in a population using these risk factors?
        Last edited by Jacob John; 12 Nov 2019, 20:53.

        Comment

        Working...
        X