Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulating an interaction from a logit with correlated predictors

    Hi all,

    I'd like to simulate some data (for a power analysis) that performs a logit with an interaction between two dummies/dichotomous predictors ('group' and 'task'). Below is a minimal reproducible example. Task is a repeated measure variable (i.e., each participant complete both tasks).

    Currently my simulation is set up so that performance on the dv is uncorrelated across the two tasks, but I'd like to set it up so that the two are correlated (at 0.40). I couldn't figure out how to do this, and would be grateful for any suggestions.


    Code:
    set seed 98765
    program define myprog, rclass
        clear
        set obs 400
        gen id = _n
        gen group = runiformint(0,1)
        gen dv1 = rbinomial(1,.625) if group == 0
        replace dv1 = rbinomial(1,.325) if group == 1
        gen dv2 = rbinomial(1,.58)
        reshape long dv, i(id) j(task)
        logit dv i.group##i.task, cluster(id)
        margins task, dydx(group) pwcompare(effects) post
        lincom _b[1.group:1.task]
        return scalar diffdiff = r(estimate)
        return scalar p = r(p)
    end
    simulate diffdiff = r(diffdiff) p = r(p), reps(10000) nodots: myprog

  • #2
    Originally posted by David Tannenbaum View Post
    . . . I'd like to set it up so that the two are correlated (at 0.40). I couldn't figure out how to do this, and would be grateful for any suggestions.
    You can use the user-written command ovbd to generate correlated binomial variables. It's on SSC. (It requires the user-written command ridder available from StataCorp's STB website.) I've shown below how ovbd can be used in your task.

    For your purpose, it's better to create a pool of correlated binomial outcome data and have the simulation program draw from that pool (actually two pools, each with the specified proportions, 0.625-versus-0.58 and 0.325-versus-0.58, both with a correlation coefficient of 0.40). This can be done by passing the pools (temporary datasets) and information about the current draw's location is in each of the pool datasets. (The row offset differs for the two groups, because you have made the counts in each group to be a random binomial rather than a fixed 50%.)

    The simulation program below looks a little complicated, but most of that is for returning the realized correlation (I've used xtgee with a specified working correlation matrix in order to get that) and realized proportions.

    Do-file and its log file are attached if you're interested.
    Code:
    version 18.0
    
    clear *
    
    // seedem
    set seed 344200108
    
    local count 4000000
    
    timer clear
    timer on 1
    
    // 0.625-versus-0.58 samples (Group 0)
    matrix define Corr = (1, 0.40 \ 0.40, 1)
    matrix define Means = (0.625, 0.58)
    ovbd out1 out2, means(Means) corr(Corr) n(`count') clear
    correlate out?
    summarize
    
    generate long pid = _n
    generate byte grp = 0
    tempfile group0
    quietly save `group0'
    
    // 0.325-versus-0.58 samples (Group 1)
    drop _all
    matrix define Means = (0.325, 0.58)
    ovbd out1 out2, means(Means) corr(Corr) n(`count') clear
    correlate out?
    summarize
    
    generate long pid = `count' + _n
    generate byte grp = 1
    tempfile group1
    quietly save `group1'
    
    // Keeping track of the row offsets
    tempname RowOffsets
    matrix input `RowOffsets' = (1 1)
    
    timer off 1
    
    program define simEm, rclass
        version 18.0
        syntax , rowoffsets(name) group0(string) group1(string) [n(integer 400)]
    
        // Counts in each group
        local grp0 = rbinomial(`n', 0.5)
        local grp1 = `n' - `grp0'
    
        // Get the correlated outcome data
        drop _all
        
        local begin0 = `rowoffsets'[1, 1]
        local end0 = `begin0' + `grp0'
        use in `begin0'/`end0' using `group0'
        
        tempfile hold
        save `hold'
    
    
        drop _all
    
        local begin1 = `rowoffsets'[1, 2]
        local end1 = `begin1' + `grp1'
        use in `begin1'/`end1' using `group1'
        append using `hold'
    
        // Housekeeping
        matrix define `rowoffsets' = ///
            (`end0' + 1, `end1' + 1)
    
        // Wrap up
        reshape long out, i(pid) j(tsk)
    
        xtgee out i.grp##i.tsk, i(pid) family(binomial) link(log) corr(exchangeable)
        margins grp#tsk
        tempname g0t1 g0t2 g1t1 g1t2 did
        scalar define `g0t1' = r(table)["b", "0.grp#1.tsk"]
        scalar define `g0t2' = r(table)["b", "0.grp#1.tsk"]
        scalar define `g1t1' = r(table)["b", "1.grp#1.tsk"]
        scalar define `g1t2' = r(table)["b", "1.grp#2.tsk"]
    
        nlcom did: ///
            ( invlogit(_b[_cons] + _b[1.grp] + _b[2.tsk] + _b[1.grp#2.tsk]) - ///
                invlogit(_b[_cons] + _b[1.grp]) ) - ///
            ( invlogit(_b[_cons] + _b[2.tsk]) - invlogit(_b[_cons]) )
        scalar define `did' = r(table)["pvalue", "did"]
    
        estat wcorrelation
        return scalar rho = r(R)[2, 1]
    
        foreach g in g0 g1 {
            foreach t in t1 t2 {
                return scalar `g'`t' = ``g'`t''
            }
        }
        return scalar did = `did'
    end
    
    local list rho = r(rho) did = r(did)
    foreach g in g0 g1 {
        foreach t in t1 t2 {
            local list `list' `g'`t' = r(`g'`t')
        }
    }
    
    timer on 2
    
    quietly simulate `list', reps(300): simEm , ///
        rowoffsets(`RowOffsets') group0(`group0') group1(`group1')
    
    timer off 2
    
    // Realized correlation
    quietly summarize rho, meanonly
    display in smcl as text "Correlation between outcome variables = " ///
        as result %04.2f r(mean)
    
    // Realized cell proportions
    foreach g in g0 g1 {
        foreach t in t1 t2 {
            summarize `g'`t', meanonly
            local G : subinstr local g "g" "Group "
            local T : subinstr local t "t" "Task "
            display in smcl as text "Proportion `G' `T' = " ///
                as result %05.3f r(mean)
        }
    }
    
    // Power
    generate byte pos = did < 0.01
    summarize pos, meanonly
    display in smcl as text "Power for group × task interaction = " ///
        as result %04.2f r(mean)
    
    timer list
    
    exit
    In this example, the mean realized proportions and mean realized correlation coefficient all happen by chance to match exactly what you specified (see attached log file).

    Note that my program returns the difference between groups in their differences of proportions between tasks. It's different from what yours returns, which is the difference between groups on Task 1 alone.
    Attached Files

    Comment


    • #3
      Quick sidenote: Thank you, Joseph, for writing -ovbd-. It has been extremely helpful in several of our projects.

      Comment


      • #4
        Originally posted by Tiago Pereira View Post
        Quick sidenote: Thank you, Joseph, for writing -ovbd-. It has been extremely helpful in several of our projects.
        It's certainly gratifying to hear that. I have been meaning to update and improve ovbd for some time now, but was under the impression that it hadn't seen enough use to make it worthwhile. Knowing this, I'll redouble my efforts.

        Comment


        • #5
          Joseph, this is amazing! Thank you so much for your help — definitely over and above in providing such detailed code!

          I wouldn't have thought to create separate pools to draw from for the simulation, so this is extremely helpful. I really, really appreciate all of your help. Also let me echo what Tiago said, and thanks for writing -ovbd- and all you do for the Stata community!

          Comment

          Working...
          X