Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Variable Not Found" Error in Looped Stata Code

    I am currently trying to code regression routine that looped over several hundred data file with identical variable list. The code itself is given below:

    Code:
    * Define input and output directories
    local input_dir "/Users/muhammadkhan/Library/CloudStorage/OneDrive-Personal/WBES Data/Stata Data Files"
    local output_file "/Users/muhammadkhan/Library/CloudStorage/OneDrive-Personal/WBES Data/Stata Data Files/regression_results.xlsx"
    
    * Create a temporary dataset to store all results
    clear
    input str100 file_name float coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_pvalue
    end
    
    * List all .dta files in the input directory and store them in a local macro
    local files : dir "`input_dir'" files "*.dta"
    
    * Initialize row index for storing results in the temporary dataset
    local row = 1
    
    * Loop through each file
    foreach file of local files {
    * Display file being processed
    display "Processing file: `file'"
    
    * Load the dataset
    use "`input_dir'/`file'", clear
    
    capture{
    
    REGRESSION COMMAND (SUPPRESED FOR BREVITY HERE
    
    }
    
    
    * Check if regression was successful (error code)
    if (_rc != 0) {
    display "Skipping file `file' due to regression error"
    continue // Skip this file and go to the next one
    }
    
    * Extract the regression coefficients and standard errors
    matrix b = e(b) // Coefficients
    matrix se = e(V) // Variance-covariance matrix
    
    * Store the coefficients and standard errors
    scalar coef_1 = b[1, 1]
    scalar se_1 = sqrt(se[1, 1])
    scalar coef_2 = b[1, 2]
    scalar se_2 = sqrt(se[2, 2])
    scalar coef_3 = b[1, 3]
    scalar se_3 = sqrt(se[3, 3])
    
    * Extract the post-estimation results:
    * Underidentification test (first stage F-statistic)
    scalar underid_pvalue = e(underid_p) // Underidentification p-value
    * Weak identification test (F-statistic)
    scalar weakid_fstat = e(weakid_fstat) // Weak identification F-statistic
    * Sargan test (overidentification test)
    scalar sargan_pvalue = e(sargan_p) // Sargan test p-value
    
    * Append the results to the temporary dataset
    replace file_name = "`file'" in `row'
    replace coef_1 = coef_1 in `row'
    replace se_1 = se_1 in `row'
    replace coef_2 = coef_2 in `row'
    replace se_2 = se_2 in `row'
    replace coef_3 = coef_3 in `row'
    replace se_3 = se_3 in `row'
    replace underid_pvalue = underid_pvalue in `row'
    replace weakid_fstat = weakid_fstat in `row'
    replace sargan_pvalue = sargan_pvalue in `row'
    
    * Increment row for next dataset
    local row = `row' + 1
    }
    
    * Save the results to an Excel file
    export excel file_name coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_pvalue using `output_file', firstrow(variables) replace
    
    display "Regression results saved to `output_file'"

    When I run the set trace on command, the error is found at the following piece of code (boldfaced by me):

    * Append the results to the temporary dataset
    replace file_name = "`file'" in `row'

    Apparently, Stata is unable to find the `file' string even though it displays this string at the beginning of executing this code (boldfaced by me). This is perplexing, is something happening between the first time the file name shows up and this part? I am not sure and I need to solve it for this code to work. The regression part executes properly so I am not including that here.
    Last edited by Muhammad Khan; 05 Jan 2025, 10:32.

  • #2
    I think you are misunderstanding the error message and its source. The string stored in local macro file is still available. The problem is that the temporary data set you want to update at this point in the code is not in memory. Once you enter the loop, the -use "`input_dir'/`file'", clear- command eliminates it. Since it was not preserved or saved in a tempfile or a new frame, it is lost and gone forever at that point The error message arises because the data in memory does not contain a variable named file_name.

    The simplest solution to this is to use a separate frame for holding the results.

    Code:
    * Define input and output directories
    local input_dir "/Users/muhammadkhan/Library/CloudStorage/OneDrive-Personal/WBES Data/Stata Data Files"
    local output_file "/Users/muhammadkhan/Library/CloudStorage/OneDrive-Personal/WBES Data/Stata Data Files/regression_results.xlsx"
    
    * Create a frame to store all results
    frame create results str100 file_name float coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_pvalue
    
    * List all .dta files in the input directory and store them in a local macro
    local files : dir "`input_dir'" files "*.dta"
    
    * Loop through each file
    foreach file of local files {
        * Display file being processed
        display "Processing file: `file'"
    
        * Load the dataset
        use "`input_dir'/`file'", clear
    
        capture{
    
        REGRESSION COMMAND (SUPPRESED FOR BREVITY HERE
    
        }
    
    
        * Check if regression was successful (error code)
        if (_rc != 0) {
            display "Skipping file `file' due to regression error"
            continue // Skip this file and go to the next one
        }
    
        * Extract the regression coefficients and standard errors
        matrix b = e(b) // Coefficients
        matrix se = e(V) // Variance-covariance matrix
    
        * Store the coefficients and standard errors
        scalar coef_1 = b[1, 1]
        scalar se_1 = sqrt(se[1, 1])
        scalar coef_2 = b[1, 2]
        scalar se_2 = sqrt(se[2, 2])
        scalar coef_3 = b[1, 3]
        scalar se_3 = sqrt(se[3, 3])
    
        * Extract the post-estimation results:
        * Underidentification test (first stage F-statistic)
        scalar underid_pvalue = e(underid_p) // Underidentification p-value
        * Weak identification test (F-statistic)
        scalar weakid_fstat = e(weakid_fstat) // Weak identification F-statistic
        * Sargan test (overidentification test)
        scalar sargan_pvalue = e(sargan_p) // Sargan test p-value
    
        * Append the results to the temporary dataset
        frame post results (`"`file'"') (coef_1) (se_1) (coef_2) (se_2) (coef_3) (se_3) ///
            (underid_pvalue) (weakid_fstat) (sargan_pvalue)
    
    }
    
    * Save the results to an Excel file
    frame results: export excel file_name coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_pvalue using `output_file', firstrow(variables) replace
    
    display "Regression results saved to `output_file'"
    Note the elimination of the local macro row and its subsequent updating--it serves no purpose in this approach. Note also the elimination of all the -replace- commands following the regression, their function being taken over by the -frame post- command.

    Comment


    • #3
      Thanks for the reply. I tried it and the loop worked just fine, however, there is a new error message when trying to save the results to the excel file:

      Code:
      . * Save the results to an Excel file
      . frame results: export excel file_name coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_
      > pvalue using `output_file', firstrow(variables) replace
      invalid 'Data'
      r(198);
      Last edited by Muhammad Khan; 05 Jan 2025, 19:53.

      Comment


      • #4
        Ah,, yes, I didn't notice that little problem. The problem is that your output file's full pathname contains spaces, and so it must be wrapped in quotes.

        Code:
        frame results: export excel file_name coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_pvalue using `"`output_file'"', firstrow(variables) replace

        Comment


        • #5
          Thanks for the troubleshooting. After running the code, I see that the code was executed correctly and the output was stored in the destination excel file, but the post-estimation test results from ivreg2 (underid_p, sargan_p, and weak_Fstat) are missing. What can be responsible for that?

          Comment


          • #6
            I don't know. You never showed the regression command itself, and it may be that you have omitted some options needed to get those results. Even that is just a speculation on my part as I do not use and do not know anything about -ivreg2-. But my first suggestion would be to run the regression command on just one of the data sets (not in a loop) and immediately follow it with -ereturn list-. You may find that those statistics simply aren't calculated, or perhaps their names are different from what you used in your code.

            Comment


            • #7
              thanks, all problems solved.

              Comment

              Working...
              X