E-values in linear regression loops: post-estimation commands problems?

Ila Costa

Join Date: Oct 2022
Posts: 9

E-values in linear regression loops: post-estimation commands problems?

26 Jun 2024, 05:28

Hi all,
As sensitivity analysis for my project I want to estimate e-values for my final statistical models (both in complete case and imputed dataset) for every regression models I have run – and there are several. The script works fine when I run in separate linear regressions using margins to obtain the SE to use to calculate the pooled standard deviation (sdp), use esizeregi to get the SMD and input that into the evalue command. However, because I have several models I have tried to put these steps into a loop. I have tried to debug this and it says that I do not have n2 (is missing or 0) but this is not true. I do not receive any error message but then the temporary dataset is empty – thus the loop is not working properly.

My script is:

Code:

* Always use this dataset from now on for the analyses
use "dataset.dta", clear

* Define the outcomes and independent variables
local outcomes "outcome1 outcome2 outcome3 outcome4 outcome5 outcome6 outcome7"
local indep_vars "exp cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8 cov9 cov10 cov11 cov12 cov13 cov14 cov15 cov16"

* Initialize an empty dataset to store the results
tempfile results_file
capture postclose results
postfile results str32 outcome float(beta pooled_sd n1 n2 smd se_smd evalue) using `results_file'

* Loop through each outcome
foreach outcome in `outcomes' {
    display "Processing outcome: `outcome'"

    * Drop residuals variable if it exists to prevent conflicts
    capture drop residuals
    
    * Run regression for the outcome
    regress `outcome' `indep_vars' if model7out == 1
    if _rc {
        display as error "Regression failed for outcome: `outcome'"
        continue
    }
    
    * Store the coefficient for `anydeb16`
    local beta = _b[anydeb16]
    
    * Calculate pooled SD from residuals
    predict residuals, residuals
    if _rc {
        display as error "Predict failed for outcome: `outcome'"
        continue
    }
    su residuals
    local pooled_sd = r(sd)

    * Validate pooled_sd
    if missing(`pooled_sd') | `pooled_sd' == 0 {
        display as error "Pooled SD is missing or zero for outcome: `outcome'"
        continue
    }
    
    * Get the number of observations in each group
    tabulate anydeb16 if model7out == 1
    matrix N = r(N)
    local n1 = N[1,1]
    local n2 = N[2,1]

    * Validate group sizes
    if missing(`n1') | missing(`n2') | `n1' == 0 | `n2' == 0 {
        display as error "Group sizes are missing or zero for outcome: `outcome'"
        continue
    }

    * Compute SMD
    esizeregi `beta', sdp(`pooled_sd') n1(`n1') n2(`n2')
    if _rc {
        display as error "esizeregi failed for outcome: `outcome'"
        continue
    }
    local smd = r(smd)
    local se_smd = r(se)

    * Compute E-value
    evalue smd `d', se(`se') true(0) fig
    if _rc {
        display as error "evalue failed for outcome: `outcome'"
        continue
    }
    local evalue = r(evalue)

    * Post the results
    post results ("`outcome'") (`beta') (`pooled_sd') (`n1') (`n2') (`d') (`se') (`evalue')

    * Clean up
    drop residuals
}

* Close the postfile
postclose results

* Use the results dataset and list
use `results_file', clear
list

I am using STATA v17 for Windows.

Also, I have more of a statistical question that a STATA question too. I have used this to obtain evalue for SMD

HTML Code:

https://journals.sagepub.com/doi/full/10.1177/1536867X20909696#bibr25-1536867X20909696

as guidance even though sdy does not work anymore (at least for me) and it requires sdp which I obtain as by guidance in the stata evalue help file. I am asked to inpute n1 and n2 for the two groups of the exposure/independent variable however it does not specify what to do in case of missing data. Should I input the numbers of the groups as by completeness in the regression model not the full sample? Do you know if these commands will work too in my imputed datasets?

Sorry if I am asking something that is very obvious, but I cannot find an easy answer anywhere. Any help would be really appreciated! Thanks!!

Tags: None

Ila Costa

Join Date: Oct 2022
Posts: 9

15 Jul 2024, 09:52

Hi all,

Just in case it may be helpful to anyone, I think I have solved my problem and I am now able to calculate E-values correctly and automatically across a number of regression models.

HTML Code:

* Define the outcomes and independent variables
local outcomes "outcome1 outcome2 outcome3 outcome4 outcome5 outcome6 outcome7"
local indep_vars "exp cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8 cov9 cov10 cov11 cov12 cov13 cov14 cov15 cov16"

* Initialize an empty dataset to store the results
tempfile results_file
capture postclose results
postfile results str32 outcome float(beta pooled_sd n1 n2 smd se_smd evalue evalue_ci) using `results_file'

* Loop through each outcome
foreach outcome in `outcomes' {
    display "Processing outcome: `outcome'"

    * Drop residuals variable if it exists to prevent conflicts
    capture drop residuals

    * Check number of observations for `model7out'
    count if model7out == 1
    local model7out_count = r(N)
    display "Number of observations with complete data (model7out == 1): `model7out_count'"

    if `model7out_count' == 0 {
        display as error "No observations with complete data for outcome: `outcome'"
        continue
    }

    * Run regression for the outcome
    regress `outcome' `indep_vars' if model7out == 1
    if _rc {
        display as error "Regression failed for outcome: `outcome'"
        continue
    }
    
    * Store the coefficient for `anydeb16'
    local beta = _b[anydeb16]

    * Calculate pooled SD from residuals
    predict residuals, residuals
    if _rc {
        display as error "Predict failed for outcome: `outcome'"
        continue
    }
    su residuals
    local pooled_sd = r(sd)

    * Validate pooled_sd
    if missing(`pooled_sd') | `pooled_sd' == 0 {
        display as error "Pooled SD is missing or zero for outcome: `outcome'"
        continue
    }

    * Get the number of observations in each group
    quietly: tabulate anydeb16 if model7out == 1, matcell(N)
    matrix list N
    local n1 = N[1,1]
    local n2 = N[2,1]

    * Debugging lines to ensure values are extracted correctly
    display "Group size n1: `n1'"
    display "Group size n2: `n2'"

    * Validate group sizes
    if missing(`n1') | missing(`n2') | `n1' == 0 | `n2' == 0 {
        display as error "Group sizes are missing or zero for outcome: `outcome'"
        continue
    }

    * Compute SMD using esizeregi
    esizeregi `beta', sdp(`pooled_sd') n1(`n1') n2(`n2')
    if _rc {
        display as error "esizeregi failed for outcome: `outcome'"
        continue
    }
    local smd = r(d)
    local se_smd = r(se)

    * Compute E-value
    evalue smd `smd', se(`se_smd') true(0) fig
    if _rc {
        display as error "evalue failed for outcome: `outcome'"
        continue
    }
    local evalue = r(eval_est)
    local evalue_ci = r(eval_ci)

    * Save the figure as PNG with specified title
    graph export "Evalue_figure_`outcome'.png", replace

    * Post the results
    post results ("`outcome'") (`beta') (`pooled_sd') (`n1') (`n2') (`smd') (`se_smd') (`evalue') (`evalue_ci')

    * Clean up
    drop residuals
}

* Close the postfile
postclose results

* Use the results dataset and list
use `results_file', clear
list

* Save the results dataset
save evalues, replace

However, I am still uncertain on how to best apply (if possible) this sensitivity analysis to regressions performed in multiply imputed datasets.. Any help would be very appreciated! Thanks!

Comment

Ila Costa

Join Date: Oct 2022
Posts: 9

02 Oct 2024, 08:13

Hi,

Is anyone able to help? Fundamentally (whether I can get to this by using a loop or fitting each outcome separately), I want to adapt the following script I ran in the complete case dataset for the analyses in the imputed sample and estimate there the E values.
I want to adapt the E value calculation for the output of linear regressions (continuous outcome) found here: https://www.hsph.harvard.edu/tyler-v...ta_Article.pdf

Code:

regress outcome exposure cov1 cov2 cov3 cov4 cov5

*Use margins commans as you need sdp (pooled standard deviation): margins command without any predictor variables, and then converting the pooled standard error to the pooled standard deviation by multipying it by the square-root of N (standard error * sqrt(N));
margins 

*The pooled SE (sep) is ... "x"
*CALCULATE SDP
gen sdp=sep * sqrt(709)
su sdp

*then I tabulate the exposure as I need to get the number of observations in each group
tabulate anydeb16
*n1=4,262 , n2=691

*now get the SMD
esizeregi beta, sdp(sdp) n1(n1) n2(n2)
*SMD/Cohen's d is -.0225121, se: .0410428

*Next, we plug the estimate and standard error into evalue smd to compute the E-values.
evalue smd beta, se(0.0410428) true(0) fig

Please any help in doing this in imputed data would be massively appreciated!

Comment

Erik Ruzek

Join Date: Oct 2017

Posts: 429
#4

02 Oct 2024, 12:25

Where is it breaking down with mi data? If margins, you can use daniel klein's fantastic mimrgns command to get marginal effects after mi estimate: regress.

Code:

net describe mimrgns, from(http://fmwww.bc.edu/RePEc/bocode/m)

If it is esizeregi or evalue, you could try using mi estimate, cmdok: but make sure that it actually is ok.

By the way, you can get the standard error and N from the margins call to plug into your sdp formula by pulling the values from the requisite matrices after margins using

Code:

margins, post gen sdp = (r(table)["se", "_cons"]) * (sqrt(r(_N)[1,1])) su sdp

You can probably take this approach for some of the subsequent calculations, particularly those using the results from a previous command. You need to look at what Stata stores after estimation with

Code:

return list
Comment

Ila Costa

Join Date: Oct 2022
Posts: 9

07 Oct 2024, 08:57

Thanks for your response! I have indeed moved to mimrgns (create coomand!) and am now able to estimate E values in imputed dataset using the following code (just in case of any help to anyone):

Code:

mi estimate: regress depvar indepvar cov1 cov2 cov3

*Use margins commans as you need sdp (pooled standard deviation): margins command without any predictor variables, and then converting the pooled standard error to the pooled standard deviation by multipying it by the square-root of N (standard error * sqrt(N));
mimrgns

*The pooled SE is 0.05

*CALCULATE SDP
mi passive: gen sdp=0.05 * sqrt(3500) // sample size
su sdp
*3.30

*then I tabulate the exposure as I need to get the number of observations in each group
mi estimate: prop indepvar
*n1=3000 , n2=500

*now get the SMD
esizeregi 0.43, sdp(3.30) n1(3000) n2(500)
*SMD/Cohen's d is xx, se: .yy

*Next, we plug the estimate and standard error into evalue smd to compute the E-values.
evalue smd est, se(yy) true(0) fig


I am plugging all the numbers manually, which is a bit of a pain and I guess error-prone. I am still trying to sort out how to incorporate this in a loop but I am receiving errors when running the following:

* Define the outcomes and independent variables
local outcomes "depvar1 depvar2 depvar3 depvar4"
local indep_vars "anydeb16 cov1 cov2 cov3 cov4 cov5 cov6 cov7"

* Initialize a temporary file to store results
tempfile results_file

capture postclose results

postfile results str32 outcome float(beta pooled_sd n1 n2 smd se_smd evalue evalue_ci) using `results_file'

* Loop through each outcome
foreach outcome of local outcomes {
display "Processing outcome: `outcome'"

* Run regression for the outcome with multiple imputation
mi estimate: regress `outcome' `indep_vars'
if _rc {
display as error "Regression failed for outcome: `outcome'"
continue
}

* Store the coefficient for anydeb16
matrix b = e(b)
local beta = b[1, 1] // Extract the coefficient for `anydeb16`

* Calculate the pooled standard error using mimrgns
mimrgns

* Extract the standard error from the mimrgns output
matrix list r(table) // Display the table matrix for extraction
local pooled_se = r(table)[1, 2] // The standard error is in the first row, second column

* Calculate the pooled standard deviation (SDP) directly without creating a new variable
local sdp = `pooled_se' * sqrt(e(N)) // Multiply SE by square root of N

if missing(`sdp') | `sdp' == 0 {
display as error "Pooled SD is missing or zero for outcome: `outcome'"
continue
}

* Get the number of observations and group sizes for anydeb16
quietly: tabulate anydeb16, matcell(N)
local n1 = N[1, 1]
local n2 = N[2, 1]

* Ensure group sizes are valid
if missing(`n1') | missing(`n2') | `n1' == 0 | `n2' == 0 {
display as error "Group sizes are missing or zero for outcome: `outcome'"
continue
}

display "Group size n1: `n1'"
display "Group size n2: `n2'"

* Compute SMD using esizeregi
esizeregi `beta', sdp(`sdp') n1(`n1') n2(`n2')
if _rc {
display as error "esizeregi failed for outcome: `outcome'"
continue
}
local smd = r(d)
local se_smd = r(se)

* Compute E-value
evalue smd `smd', se(`se_smd') true(0) fig
if _rc {
display as error "E-value failed for outcome: `outcome'"
continue
}
local evalue = r(eval_est)
local evalue_ci = r(eval_ci)

* Save the figure as PNG with specified title
graph export "Evalue_ImputedAnyDEB_`outcome'.png", replace title("E-values for the association between any disordered eating behavior at age 16 and `outcome' at age 18")

* Post the results into the dataset
post results ("`outcome'") (`beta') (`pooled_sd') (`n1') (`n2') (`smd') (`se_smd') (`evalue') (`evalue_ci')
}

Would you know what is going wrong? I cannot generate sdp as if I want to loop for each outcome then the loop would get stuck after the first sdp generated.

Thank you very much for your help!

Comment

Erik Ruzek

Join Date: Oct 2017

Posts: 429
#6

07 Oct 2024, 09:20

Ila,

My guess is that you will need to update your code so that the sdp variable is generated uniquely for each of the variables. That is, from the beginning to the end of the loop, replace all instances of sdp with something like sdp_`outcome'.
Comment

Ila Costa

Join Date: Oct 2022
Posts: 9

10 Oct 2024, 07:08

Thanks Erik!

For everyone who might be looking for a similar script, I was able to sort out the loop, and this should now be running okay.

Code:

* Define the outcomes and independent variables
local outcomes "out1 out2 out3 out3 out4 out5 out6 out7"
local indep_vars "exp cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8 cov9"

* Initialize an empty dataset to store the results
tempfile results_file
capture postclose results
postfile results str32 outcome float(beta pooled_sd n1 n2 smd se_smd evalue evalue_ci) using `results_file'
 
* Loop through each outcome
foreach outcome of local outcomes {
  display "Processing outcome: `outcome'"
  
* Run regression for the outcome with multiple imputation
mi estimate: regress `outcome' `indep_vars'
     if _rc {
         display as error "Regression failed for outcome: `outcome'"
         continue
     }
     * Extract the coefficient for 'anydeb16' (it's in the first column, row 1)
     matrix beta_matrix = e(b_mi)
     local beta = beta_matrix[1, 1]

     * Display the extracted beta coefficient
     display "Value of beta coefficient for exp: `beta'"

 * Calculate the pooled standard error using mimrgns
     mimrgns
 
    * Extract the standard error from the mimrgns output
    matrix list r(table)  // Display the table matrix for extraction
    local pooled_se = r(table)[2, 1]  // The standard error is in the second row, first column

    * Calculate the pooled standard deviation (SDP) directly without creating a new variable
    //gen sdp_outcome' = (r(table)["se", "_cons"]) * (sqrt(r(_N)[1,1]))
    gen sdp_`outcome' = (`pooled_se') * (sqrt(4535)) // Multiply SE by square root of N
    * Summarize and extract the mean of sdp for this outcome
    su sdp_`outcome'
    
    * Capture the mean value in a local macro
    *local sdp_mean_outcome' = round(r(mean), 0.000001)       
    local sdp_mean = round(r(mean), 0.000001)
    display "Mean of sdp: `sdp_mean'"

    * Get the number of observations and group sizes for exp - change n1 and n2 for other exposures
     local n1 3912    
     local n2 623

    * Display and check that all values are valid before running esizeregi
    display "Beta: `beta'"
    display "SDP Mean: `sdp_mean'"
    display "Group sizes: n1=n1', n2=n2'"
    * Ensure that none of the variables are missing or invalid before proceeding
    if missing(`beta') | missing(`sdp_mean') | missing(`n1') | missing(`n2') {
    display as error "Missing values detected for outcome: `outcome'"
    continue
    }

   * Compute SMD using esizeregi (ensure beta and sdp_mean are properly referenced)
   esizeregi `beta', sdp(`sdp_mean') n1(`n1') n2(`n2')
     if _rc {
     display as error "esizeregi failed for outcome: `outcome'"
     continue
 }
    * Compute SMD using esizeregi
        esizeregi `beta', sdp(`sdp_mean') n1(`n1') n2(`n2')
        if _rc {
        display as error "esizeregi failed for outcome: `outcome'"
        continue
     }
      local smd = r(d)
      local se_smd = r(se)
 
   * Compute E-value
    evalue smd `smd', se(`se_smd') true(0) fig
    if _rc {
        display as error "E-value failed for outcome: `outcome'"
        continue
     }
     local evalue = r(eval_est)
     local evalue_ci = r(eval_ci)
 
    * Save the figure as PNG with specified title
     graph export "Evalue_Imputed_`outcome'.png"

    * Post the results into the dataset (beta pooled_sd n1 n2 smd se_smd evalue evalue_ci)
    post results ("`outcome'") (`beta') (`sdp_mean') (`n1') (`n2') (`smd') (`se_smd') (`evalue') (`evalue_ci')

}

* Close the postfile
postclose results

* Use the results dataset and list
use `results_file', clear
list

* Save the results dataset
save evalues_exp, replace

There are for sure things that could be improve of this code (e.g., the extraction of the sample size) but this at least works without too much manual changing!

Announcement

E-values in linear regression loops: post-estimation commands problems?

Comment

Comment

Comment

Comment

Comment

Comment