Simulating an interaction from a logit with correlated predictors

David Tannenbaum

Join Date: Jul 2020

Posts: 20
#1

Simulating an interaction from a logit with correlated predictors

14 Jun 2024, 21:44

Hi all,

I'd like to simulate some data (for a power analysis) that performs a logit with an interaction between two dummies/dichotomous predictors ('group' and 'task'). Below is a minimal reproducible example. Task is a repeated measure variable (i.e., each participant complete both tasks).

Currently my simulation is set up so that performance on the dv is uncorrelated across the two tasks, but I'd like to set it up so that the two are correlated (at 0.40). I couldn't figure out how to do this, and would be grateful for any suggestions.

Code:

set seed 98765 program define myprog, rclass clear set obs 400 gen id = _n gen group = runiformint(0,1) gen dv1 = rbinomial(1,.625) if group == 0 replace dv1 = rbinomial(1,.325) if group == 1 gen dv2 = rbinomial(1,.58) reshape long dv, i(id) j(task) logit dv i.group##i.task, cluster(id) margins task, dydx(group) pwcompare(effects) post lincom _b[1.group:1.task] return scalar diffdiff = r(estimate) return scalar p = r(p) end simulate diffdiff = r(diffdiff) p = r(p), reps(10000) nodots: myprog
Tags: None

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

15 Jun 2024, 21:30

Originally posted by David Tannenbaum View Post

. . . I'd like to set it up so that the two are correlated (at 0.40). I couldn't figure out how to do this, and would be grateful for any suggestions.

You can use the user-written command ovbd to generate correlated binomial variables. It's on SSC. (It requires the user-written command ridder available from StataCorp's STB website.) I've shown below how ovbd can be used in your task.

For your purpose, it's better to create a pool of correlated binomial outcome data and have the simulation program draw from that pool (actually two pools, each with the specified proportions, 0.625-versus-0.58 and 0.325-versus-0.58, both with a correlation coefficient of 0.40). This can be done by passing the pools (temporary datasets) and information about the current draw's location is in each of the pool datasets. (The row offset differs for the two groups, because you have made the counts in each group to be a random binomial rather than a fixed 50%.)

The simulation program below looks a little complicated, but most of that is for returning the realized correlation (I've used xtgee with a specified working correlation matrix in order to get that) and realized proportions.

Do-file and its log file are attached if you're interested.

Code:

version 18.0

clear *

// seedem
set seed 344200108

local count 4000000

timer clear
timer on 1

// 0.625-versus-0.58 samples (Group 0)
matrix define Corr = (1, 0.40 \ 0.40, 1)
matrix define Means = (0.625, 0.58)
ovbd out1 out2, means(Means) corr(Corr) n(`count') clear
correlate out?
summarize

generate long pid = _n
generate byte grp = 0
tempfile group0
quietly save `group0'

// 0.325-versus-0.58 samples (Group 1)
drop _all
matrix define Means = (0.325, 0.58)
ovbd out1 out2, means(Means) corr(Corr) n(`count') clear
correlate out?
summarize

generate long pid = `count' + _n
generate byte grp = 1
tempfile group1
quietly save `group1'

// Keeping track of the row offsets
tempname RowOffsets
matrix input `RowOffsets' = (1 1)

timer off 1

program define simEm, rclass
    version 18.0
    syntax , rowoffsets(name) group0(string) group1(string) [n(integer 400)]

    // Counts in each group
    local grp0 = rbinomial(`n', 0.5)
    local grp1 = `n' - `grp0'

    // Get the correlated outcome data
    drop _all
    
    local begin0 = `rowoffsets'[1, 1]
    local end0 = `begin0' + `grp0'
    use in `begin0'/`end0' using `group0'
    
    tempfile hold
    save `hold'


    drop _all

    local begin1 = `rowoffsets'[1, 2]
    local end1 = `begin1' + `grp1'
    use in `begin1'/`end1' using `group1'
    append using `hold'

    // Housekeeping
    matrix define `rowoffsets' = ///
        (`end0' + 1, `end1' + 1)

    // Wrap up
    reshape long out, i(pid) j(tsk)

    xtgee out i.grp##i.tsk, i(pid) family(binomial) link(log) corr(exchangeable)
    margins grp#tsk
    tempname g0t1 g0t2 g1t1 g1t2 did
    scalar define `g0t1' = r(table)["b", "0.grp#1.tsk"]
    scalar define `g0t2' = r(table)["b", "0.grp#1.tsk"]
    scalar define `g1t1' = r(table)["b", "1.grp#1.tsk"]
    scalar define `g1t2' = r(table)["b", "1.grp#2.tsk"]

    nlcom did: ///
        ( invlogit(_b[_cons] + _b[1.grp] + _b[2.tsk] + _b[1.grp#2.tsk]) - ///
            invlogit(_b[_cons] + _b[1.grp]) ) - ///
        ( invlogit(_b[_cons] + _b[2.tsk]) - invlogit(_b[_cons]) )
    scalar define `did' = r(table)["pvalue", "did"]

    estat wcorrelation
    return scalar rho = r(R)[2, 1]

    foreach g in g0 g1 {
        foreach t in t1 t2 {
            return scalar `g'`t' = ``g'`t''
        }
    }
    return scalar did = `did'
end

local list rho = r(rho) did = r(did)
foreach g in g0 g1 {
    foreach t in t1 t2 {
        local list `list' `g'`t' = r(`g'`t')
    }
}

timer on 2

quietly simulate `list', reps(300): simEm , ///
    rowoffsets(`RowOffsets') group0(`group0') group1(`group1')

timer off 2

// Realized correlation
quietly summarize rho, meanonly
display in smcl as text "Correlation between outcome variables = " ///
    as result %04.2f r(mean)

// Realized cell proportions
foreach g in g0 g1 {
    foreach t in t1 t2 {
        summarize `g'`t', meanonly
        local G : subinstr local g "g" "Group "
        local T : subinstr local t "t" "Task "
        display in smcl as text "Proportion `G' `T' = " ///
            as result %05.3f r(mean)
    }
}

// Power
generate byte pos = did < 0.01
summarize pos, meanonly
display in smcl as text "Power for group × task interaction = " ///
    as result %04.2f r(mean)

timer list

exit

In this example, the mean realized proportions and mean realized correlation coefficient all happen by chance to match exactly what you specified (see attached log file).

Note that my program returns the difference between groups in their differences of proportions between tasks. It's different from what yours returns, which is the difference between groups on Task 1 alone.

Attached Files

Comment

Tiago Pereira

Join Date: Jan 2016

Posts: 389
#3

16 Jun 2024, 01:03

Quick sidenote: Thank you, Joseph, for writing -ovbd-. It has been extremely helpful in several of our projects.
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#4

16 Jun 2024, 02:37

Originally posted by Tiago Pereira View Post

Quick sidenote: Thank you, Joseph, for writing -ovbd-. It has been extremely helpful in several of our projects.

It's certainly gratifying to hear that. I have been meaning to update and improve ovbd for some time now, but was under the impression that it hadn't seen enough use to make it worthwhile. Knowing this, I'll redouble my efforts.
1 like
Comment
David Tannenbaum

Join Date: Jul 2020

Posts: 20
#5

16 Jun 2024, 11:28

Joseph, this is amazing! Thank you so much for your help — definitely over and above in providing such detailed code!

I wouldn't have thought to create separate pools to draw from for the simulation, so this is extremely helpful. I really, really appreciate all of your help. Also let me echo what Tiago said, and thanks for writing -ovbd- and all you do for the Stata community!
Comment

Announcement