Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • E-values in linear regression loops: post-estimation commands problems?

    Hi all,
    As sensitivity analysis for my project I want to estimate e-values for my final statistical models (both in complete case and imputed dataset) for every regression models I have run – and there are several. The script works fine when I run in separate linear regressions using margins to obtain the SE to use to calculate the pooled standard deviation (sdp), use esizeregi to get the SMD and input that into the evalue command. However, because I have several models I have tried to put these steps into a loop. I have tried to debug this and it says that I do not have n2 (is missing or 0) but this is not true. I do not receive any error message but then the temporary dataset is empty – thus the loop is not working properly.

    My script is:

    Code:
    * Always use this dataset from now on for the analyses
    use "dataset.dta", clear
    
    * Define the outcomes and independent variables
    local outcomes "outcome1 outcome2 outcome3 outcome4 outcome5 outcome6 outcome7"
    local indep_vars "exp cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8 cov9 cov10 cov11 cov12 cov13 cov14 cov15 cov16"
    
    * Initialize an empty dataset to store the results
    tempfile results_file
    capture postclose results
    postfile results str32 outcome float(beta pooled_sd n1 n2 smd se_smd evalue) using `results_file'
    
    * Loop through each outcome
    foreach outcome in `outcomes' {
        display "Processing outcome: `outcome'"
    
        * Drop residuals variable if it exists to prevent conflicts
        capture drop residuals
        
        * Run regression for the outcome
        regress `outcome' `indep_vars' if model7out == 1
        if _rc {
            display as error "Regression failed for outcome: `outcome'"
            continue
        }
        
        * Store the coefficient for `anydeb16`
        local beta = _b[anydeb16]
        
        * Calculate pooled SD from residuals
        predict residuals, residuals
        if _rc {
            display as error "Predict failed for outcome: `outcome'"
            continue
        }
        su residuals
        local pooled_sd = r(sd)
    
        * Validate pooled_sd
        if missing(`pooled_sd') | `pooled_sd' == 0 {
            display as error "Pooled SD is missing or zero for outcome: `outcome'"
            continue
        }
        
        * Get the number of observations in each group
        tabulate anydeb16 if model7out == 1
        matrix N = r(N)
        local n1 = N[1,1]
        local n2 = N[2,1]
    
        * Validate group sizes
        if missing(`n1') | missing(`n2') | `n1' == 0 | `n2' == 0 {
            display as error "Group sizes are missing or zero for outcome: `outcome'"
            continue
        }
    
        * Compute SMD
        esizeregi `beta', sdp(`pooled_sd') n1(`n1') n2(`n2')
        if _rc {
            display as error "esizeregi failed for outcome: `outcome'"
            continue
        }
        local smd = r(smd)
        local se_smd = r(se)
    
        * Compute E-value
        evalue smd `d', se(`se') true(0) fig
        if _rc {
            display as error "evalue failed for outcome: `outcome'"
            continue
        }
        local evalue = r(evalue)
    
        * Post the results
        post results ("`outcome'") (`beta') (`pooled_sd') (`n1') (`n2') (`d') (`se') (`evalue')
    
        * Clean up
        drop residuals
    }
    
    * Close the postfile
    postclose results
    
    * Use the results dataset and list
    use `results_file', clear
    list
    I am using STATA v17 for Windows.

    Also, I have more of a statistical question that a STATA question too. I have used this to obtain evalue for SMD
    HTML Code:
    https://journals.sagepub.com/doi/full/10.1177/1536867X20909696#bibr25-1536867X20909696
    as guidance even though sdy does not work anymore (at least for me) and it requires sdp which I obtain as by guidance in the stata evalue help file. I am asked to inpute n1 and n2 for the two groups of the exposure/independent variable however it does not specify what to do in case of missing data. Should I input the numbers of the groups as by completeness in the regression model not the full sample? Do you know if these commands will work too in my imputed datasets?

    Sorry if I am asking something that is very obvious, but I cannot find an easy answer anywhere. Any help would be really appreciated! Thanks!!


  • #2
    Hi all,

    Just in case it may be helpful to anyone, I think I have solved my problem and I am now able to calculate E-values correctly and automatically across a number of regression models.

    HTML Code:
    * Define the outcomes and independent variables
    local outcomes "outcome1 outcome2 outcome3 outcome4 outcome5 outcome6 outcome7"
    local indep_vars "exp cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8 cov9 cov10 cov11 cov12 cov13 cov14 cov15 cov16"
    
    * Initialize an empty dataset to store the results
    tempfile results_file
    capture postclose results
    postfile results str32 outcome float(beta pooled_sd n1 n2 smd se_smd evalue evalue_ci) using `results_file'
    
    * Loop through each outcome
    foreach outcome in `outcomes' {
        display "Processing outcome: `outcome'"
    
        * Drop residuals variable if it exists to prevent conflicts
        capture drop residuals
    
        * Check number of observations for `model7out'
        count if model7out == 1
        local model7out_count = r(N)
        display "Number of observations with complete data (model7out == 1): `model7out_count'"
    
        if `model7out_count' == 0 {
            display as error "No observations with complete data for outcome: `outcome'"
            continue
        }
    
        * Run regression for the outcome
        regress `outcome' `indep_vars' if model7out == 1
        if _rc {
            display as error "Regression failed for outcome: `outcome'"
            continue
        }
        
        * Store the coefficient for `anydeb16'
        local beta = _b[anydeb16]
    
        * Calculate pooled SD from residuals
        predict residuals, residuals
        if _rc {
            display as error "Predict failed for outcome: `outcome'"
            continue
        }
        su residuals
        local pooled_sd = r(sd)
    
        * Validate pooled_sd
        if missing(`pooled_sd') | `pooled_sd' == 0 {
            display as error "Pooled SD is missing or zero for outcome: `outcome'"
            continue
        }
    
        * Get the number of observations in each group
        quietly: tabulate anydeb16 if model7out == 1, matcell(N)
        matrix list N
        local n1 = N[1,1]
        local n2 = N[2,1]
    
        * Debugging lines to ensure values are extracted correctly
        display "Group size n1: `n1'"
        display "Group size n2: `n2'"
    
        * Validate group sizes
        if missing(`n1') | missing(`n2') | `n1' == 0 | `n2' == 0 {
            display as error "Group sizes are missing or zero for outcome: `outcome'"
            continue
        }
    
        * Compute SMD using esizeregi
        esizeregi `beta', sdp(`pooled_sd') n1(`n1') n2(`n2')
        if _rc {
            display as error "esizeregi failed for outcome: `outcome'"
            continue
        }
        local smd = r(d)
        local se_smd = r(se)
    
        * Compute E-value
        evalue smd `smd', se(`se_smd') true(0) fig
        if _rc {
            display as error "evalue failed for outcome: `outcome'"
            continue
        }
        local evalue = r(eval_est)
        local evalue_ci = r(eval_ci)
    
        * Save the figure as PNG with specified title
        graph export "Evalue_figure_`outcome'.png", replace
    
        * Post the results
        post results ("`outcome'") (`beta') (`pooled_sd') (`n1') (`n2') (`smd') (`se_smd') (`evalue') (`evalue_ci')
    
        * Clean up
        drop residuals
    }
    
    * Close the postfile
    postclose results
    
    * Use the results dataset and list
    use `results_file', clear
    list
    
    * Save the results dataset
    save evalues, replace
    However, I am still uncertain on how to best apply (if possible) this sensitivity analysis to regressions performed in multiply imputed datasets.. Any help would be very appreciated! Thanks!

    Comment


    • #3
      Hi,

      Is anyone able to help? Fundamentally (whether I can get to this by using a loop or fitting each outcome separately), I want to adapt the following script I ran in the complete case dataset for the analyses in the imputed sample and estimate there the E values.
      I want to adapt the E value calculation for the output of linear regressions (continuous outcome) found here: https://www.hsph.harvard.edu/tyler-v...ta_Article.pdf

      Code:
      regress outcome exposure cov1 cov2 cov3 cov4 cov5
      
      *Use margins commans as you need sdp (pooled standard deviation): margins command without any predictor variables, and then converting the pooled standard error to the pooled standard deviation by multipying it by the square-root of N (standard error * sqrt(N));
      margins 
      
      *The pooled SE (sep) is ... "x"
      *CALCULATE SDP
      gen sdp=sep * sqrt(709)
      su sdp
      
      *then I tabulate the exposure as I need to get the number of observations in each group
      tabulate anydeb16
      *n1=4,262 , n2=691
      
      *now get the SMD
      esizeregi beta, sdp(sdp) n1(n1) n2(n2)
      *SMD/Cohen's d is -.0225121, se: .0410428
      
      *Next, we plug the estimate and standard error into evalue smd to compute the E-values.
      evalue smd beta, se(0.0410428) true(0) fig
      Please any help in doing this in imputed data would be massively appreciated!

      Comment


      • #4
        Where is it breaking down with mi data? If margins, you can use daniel klein's fantastic mimrgns command to get marginal effects after mi estimate: regress.
        Code:
        net describe mimrgns, from(http://fmwww.bc.edu/RePEc/bocode/m)
        If it is esizeregi or evalue, you could try using mi estimate, cmdok: but make sure that it actually is ok.

        By the way, you can get the standard error and N from the margins call to plug into your sdp formula by pulling the values from the requisite matrices after margins using
        Code:
        margins, post 
        gen sdp = (r(table)["se", "_cons"]) * (sqrt(r(_N)[1,1]))
        su sdp
        You can probably take this approach for some of the subsequent calculations, particularly those using the results from a previous command. You need to look at what Stata stores after estimation with
        Code:
        return list

        Comment


        • #5
          Thanks for your response! I have indeed moved to mimrgns (create coomand!) and am now able to estimate E values in imputed dataset using the following code (just in case of any help to anyone):

          Code:
          mi estimate: regress depvar indepvar cov1 cov2 cov3
          
          *Use margins commans as you need sdp (pooled standard deviation): margins command without any predictor variables, and then converting the pooled standard error to the pooled standard deviation by multipying it by the square-root of N (standard error * sqrt(N));
          mimrgns
          
          *The pooled SE is 0.05
          
          *CALCULATE SDP
          mi passive: gen sdp=0.05 * sqrt(3500) // sample size
          su sdp
          *3.30
          
          *then I tabulate the exposure as I need to get the number of observations in each group
          mi estimate: prop indepvar
          *n1=3000 , n2=500
          
          *now get the SMD
          esizeregi 0.43, sdp(3.30) n1(3000) n2(500)
          *SMD/Cohen's d is xx, se: .yy
          
          *Next, we plug the estimate and standard error into evalue smd to compute the E-values.
          evalue smd est, se(yy) true(0) fig
          
          
          I am plugging all the numbers manually, which is a bit of a pain and I guess error-prone. I am still trying to sort out how to incorporate this in a loop but I am receiving errors when running the following:
          
          * Define the outcomes and independent variables
          local outcomes "depvar1 depvar2 depvar3 depvar4"
          local indep_vars "anydeb16 cov1 cov2 cov3 cov4 cov5 cov6 cov7"
          
          * Initialize a temporary file to store results
          tempfile results_file
          
          capture postclose results
          
          postfile results str32 outcome float(beta pooled_sd n1 n2 smd se_smd evalue evalue_ci) using `results_file'
          
          * Loop through each outcome
          foreach outcome of local outcomes {
          display "Processing outcome: `outcome'"
          
          * Run regression for the outcome with multiple imputation
          mi estimate: regress `outcome' `indep_vars'
          if _rc {
          display as error "Regression failed for outcome: `outcome'"
          continue
          }
          
          * Store the coefficient for anydeb16
          matrix b = e(b)
          local beta = b[1, 1] // Extract the coefficient for `anydeb16`
          
          * Calculate the pooled standard error using mimrgns
          mimrgns
          
          * Extract the standard error from the mimrgns output
          matrix list r(table) // Display the table matrix for extraction
          local pooled_se = r(table)[1, 2] // The standard error is in the first row, second column
          
          * Calculate the pooled standard deviation (SDP) directly without creating a new variable
          local sdp = `pooled_se' * sqrt(e(N)) // Multiply SE by square root of N
          
          if missing(`sdp') | `sdp' == 0 {
          display as error "Pooled SD is missing or zero for outcome: `outcome'"
          continue
          }
          
          * Get the number of observations and group sizes for anydeb16
          quietly: tabulate anydeb16, matcell(N)
          local n1 = N[1, 1]
          local n2 = N[2, 1]
          
          * Ensure group sizes are valid
          if missing(`n1') | missing(`n2') | `n1' == 0 | `n2' == 0 {
          display as error "Group sizes are missing or zero for outcome: `outcome'"
          continue
          }
          
          display "Group size n1: `n1'"
          display "Group size n2: `n2'"
          
          * Compute SMD using esizeregi
          esizeregi `beta', sdp(`sdp') n1(`n1') n2(`n2')
          if _rc {
          display as error "esizeregi failed for outcome: `outcome'"
          continue
          }
          local smd = r(d)
          local se_smd = r(se)
          
          * Compute E-value
          evalue smd `smd', se(`se_smd') true(0) fig
          if _rc {
          display as error "E-value failed for outcome: `outcome'"
          continue
          }
          local evalue = r(eval_est)
          local evalue_ci = r(eval_ci)
          
          * Save the figure as PNG with specified title
          graph export "Evalue_ImputedAnyDEB_`outcome'.png", replace title("E-values for the association between any disordered eating behavior at age 16 and `outcome' at age 18")
          
          * Post the results into the dataset
          post results ("`outcome'") (`beta') (`pooled_sd') (`n1') (`n2') (`smd') (`se_smd') (`evalue') (`evalue_ci')
          }
          Would you know what is going wrong? I cannot generate sdp as if I want to loop for each outcome then the loop would get stuck after the first sdp generated.

          Thank you very much for your help!

          Comment


          • #6
            Ila,

            My guess is that you will need to update your code so that the sdp variable is generated uniquely for each of the variables. That is, from the beginning to the end of the loop, replace all instances of sdp with something like sdp_`outcome'.

            Comment


            • #7
              Thanks Erik!

              For everyone who might be looking for a similar script, I was able to sort out the loop, and this should now be running okay.


              Code:
              * Define the outcomes and independent variables
              local outcomes "out1 out2 out3 out3 out4 out5 out6 out7"
              local indep_vars "exp cov1 cov2 cov3 cov4 cov5 cov6 cov7 cov8 cov9"
              
              * Initialize an empty dataset to store the results
              tempfile results_file
              capture postclose results
              postfile results str32 outcome float(beta pooled_sd n1 n2 smd se_smd evalue evalue_ci) using `results_file'
               
              * Loop through each outcome
              foreach outcome of local outcomes {
                display "Processing outcome: `outcome'"
                
              * Run regression for the outcome with multiple imputation
              mi estimate: regress `outcome' `indep_vars'
                   if _rc {
                       display as error "Regression failed for outcome: `outcome'"
                       continue
                   }
                   * Extract the coefficient for 'anydeb16' (it's in the first column, row 1)
                   matrix beta_matrix = e(b_mi)
                   local beta = beta_matrix[1, 1]
              
                   * Display the extracted beta coefficient
                   display "Value of beta coefficient for exp: `beta'"
              
               * Calculate the pooled standard error using mimrgns
                   mimrgns
               
                  * Extract the standard error from the mimrgns output
                  matrix list r(table)  // Display the table matrix for extraction
                  local pooled_se = r(table)[2, 1]  // The standard error is in the second row, first column
              
                  * Calculate the pooled standard deviation (SDP) directly without creating a new variable
                  //gen sdp_outcome' = (r(table)["se", "_cons"]) * (sqrt(r(_N)[1,1]))
                  gen sdp_`outcome' = (`pooled_se') * (sqrt(4535)) // Multiply SE by square root of N
                  * Summarize and extract the mean of sdp for this outcome
                  su sdp_`outcome'
                  
                  * Capture the mean value in a local macro
                  *local sdp_mean_outcome' = round(r(mean), 0.000001)       
                  local sdp_mean = round(r(mean), 0.000001)
                  display "Mean of sdp: `sdp_mean'"
              
                  * Get the number of observations and group sizes for exp - change n1 and n2 for other exposures
                   local n1 3912    
                   local n2 623
              
                  * Display and check that all values are valid before running esizeregi
                  display "Beta: `beta'"
                  display "SDP Mean: `sdp_mean'"
                  display "Group sizes: n1=n1', n2=n2'"
                  * Ensure that none of the variables are missing or invalid before proceeding
                  if missing(`beta') | missing(`sdp_mean') | missing(`n1') | missing(`n2') {
                  display as error "Missing values detected for outcome: `outcome'"
                  continue
                  }
              
                 * Compute SMD using esizeregi (ensure beta and sdp_mean are properly referenced)
                 esizeregi `beta', sdp(`sdp_mean') n1(`n1') n2(`n2')
                   if _rc {
                   display as error "esizeregi failed for outcome: `outcome'"
                   continue
               }
                  * Compute SMD using esizeregi
                      esizeregi `beta', sdp(`sdp_mean') n1(`n1') n2(`n2')
                      if _rc {
                      display as error "esizeregi failed for outcome: `outcome'"
                      continue
                   }
                    local smd = r(d)
                    local se_smd = r(se)
               
                 * Compute E-value
                  evalue smd `smd', se(`se_smd') true(0) fig
                  if _rc {
                      display as error "E-value failed for outcome: `outcome'"
                      continue
                   }
                   local evalue = r(eval_est)
                   local evalue_ci = r(eval_ci)
               
                  * Save the figure as PNG with specified title
                   graph export "Evalue_Imputed_`outcome'.png"
              
                  * Post the results into the dataset (beta pooled_sd n1 n2 smd se_smd evalue evalue_ci)
                  post results ("`outcome'") (`beta') (`sdp_mean') (`n1') (`n2') (`smd') (`se_smd') (`evalue') (`evalue_ci')
              
              }
              
              * Close the postfile
              postclose results
              
              * Use the results dataset and list
              use `results_file', clear
              list
              
              * Save the results dataset
              save evalues_exp, replace
              There are for sure things that could be improve of this code (e.g., the extraction of the sample size) but this at least works without too much manual changing!

              Comment

              Working...
              X