"Variable Not Found" Error in Looped Stata Code

Muhammad Khan

Join Date: Jan 2025
Posts: 4

"Variable Not Found" Error in Looped Stata Code

05 Jan 2025, 09:58

I am currently trying to code regression routine that looped over several hundred data file with identical variable list. The code itself is given below:

Code:

* Define input and output directories
local input_dir "/Users/muhammadkhan/Library/CloudStorage/OneDrive-Personal/WBES Data/Stata Data Files"
local output_file "/Users/muhammadkhan/Library/CloudStorage/OneDrive-Personal/WBES Data/Stata Data Files/regression_results.xlsx"

* Create a temporary dataset to store all results
clear
input str100 file_name float coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_pvalue
end

* List all .dta files in the input directory and store them in a local macro
local files : dir "`input_dir'" files "*.dta"

* Initialize row index for storing results in the temporary dataset
local row = 1

* Loop through each file
foreach file of local files {
* Display file being processed
display "Processing file: `file'"

* Load the dataset
use "`input_dir'/`file'", clear

capture{

REGRESSION COMMAND (SUPPRESED FOR BREVITY HERE

}


* Check if regression was successful (error code)
if (_rc != 0) {
display "Skipping file `file' due to regression error"
continue // Skip this file and go to the next one
}

* Extract the regression coefficients and standard errors
matrix b = e(b) // Coefficients
matrix se = e(V) // Variance-covariance matrix

* Store the coefficients and standard errors
scalar coef_1 = b[1, 1]
scalar se_1 = sqrt(se[1, 1])
scalar coef_2 = b[1, 2]
scalar se_2 = sqrt(se[2, 2])
scalar coef_3 = b[1, 3]
scalar se_3 = sqrt(se[3, 3])

* Extract the post-estimation results:
* Underidentification test (first stage F-statistic)
scalar underid_pvalue = e(underid_p) // Underidentification p-value
* Weak identification test (F-statistic)
scalar weakid_fstat = e(weakid_fstat) // Weak identification F-statistic
* Sargan test (overidentification test)
scalar sargan_pvalue = e(sargan_p) // Sargan test p-value

* Append the results to the temporary dataset
replace file_name = "`file'" in `row'
replace coef_1 = coef_1 in `row'
replace se_1 = se_1 in `row'
replace coef_2 = coef_2 in `row'
replace se_2 = se_2 in `row'
replace coef_3 = coef_3 in `row'
replace se_3 = se_3 in `row'
replace underid_pvalue = underid_pvalue in `row'
replace weakid_fstat = weakid_fstat in `row'
replace sargan_pvalue = sargan_pvalue in `row'

* Increment row for next dataset
local row = `row' + 1
}

* Save the results to an Excel file
export excel file_name coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_pvalue using `output_file', firstrow(variables) replace

display "Regression results saved to `output_file'"

When I run the set trace on command, the error is found at the following piece of code (boldfaced by me):

* Append the results to the temporary dataset
replace file_name = "`file'" in `row'

Apparently, Stata is unable to find the `file' string even though it displays this string at the beginning of executing this code (boldfaced by me). This is perplexing, is something happening between the first time the file name shows up and this part? I am not sure and I need to solve it for this code to work. The regression part executes properly so I am not including that here.

Last edited by Muhammad Khan; 05 Jan 2025, 10:32.

Tags: foreach, loop, string, syntax

Clyde Schechter

Join Date: Apr 2014
Posts: 29691

05 Jan 2025, 14:03

I think you are misunderstanding the error message and its source. The string stored in local macro file is still available. The problem is that the temporary data set you want to update at this point in the code is not in memory. Once you enter the loop, the -use "`input_dir'/`file'", clear- command eliminates it. Since it was not preserved or saved in a tempfile or a new frame, it is lost and gone forever at that point The error message arises because the data in memory does not contain a variable named file_name.

The simplest solution to this is to use a separate frame for holding the results.

Code:

* Define input and output directories
local input_dir "/Users/muhammadkhan/Library/CloudStorage/OneDrive-Personal/WBES Data/Stata Data Files"
local output_file "/Users/muhammadkhan/Library/CloudStorage/OneDrive-Personal/WBES Data/Stata Data Files/regression_results.xlsx"

* Create a frame to store all results
frame create results str100 file_name float coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_pvalue

* List all .dta files in the input directory and store them in a local macro
local files : dir "`input_dir'" files "*.dta"

* Loop through each file
foreach file of local files {
    * Display file being processed
    display "Processing file: `file'"

    * Load the dataset
    use "`input_dir'/`file'", clear

    capture{

    REGRESSION COMMAND (SUPPRESED FOR BREVITY HERE

    }


    * Check if regression was successful (error code)
    if (_rc != 0) {
        display "Skipping file `file' due to regression error"
        continue // Skip this file and go to the next one
    }

    * Extract the regression coefficients and standard errors
    matrix b = e(b) // Coefficients
    matrix se = e(V) // Variance-covariance matrix

    * Store the coefficients and standard errors
    scalar coef_1 = b[1, 1]
    scalar se_1 = sqrt(se[1, 1])
    scalar coef_2 = b[1, 2]
    scalar se_2 = sqrt(se[2, 2])
    scalar coef_3 = b[1, 3]
    scalar se_3 = sqrt(se[3, 3])

    * Extract the post-estimation results:
    * Underidentification test (first stage F-statistic)
    scalar underid_pvalue = e(underid_p) // Underidentification p-value
    * Weak identification test (F-statistic)
    scalar weakid_fstat = e(weakid_fstat) // Weak identification F-statistic
    * Sargan test (overidentification test)
    scalar sargan_pvalue = e(sargan_p) // Sargan test p-value

    * Append the results to the temporary dataset
    frame post results (`"`file'"') (coef_1) (se_1) (coef_2) (se_2) (coef_3) (se_3) ///
        (underid_pvalue) (weakid_fstat) (sargan_pvalue)

}

* Save the results to an Excel file
frame results: export excel file_name coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_pvalue using `output_file', firstrow(variables) replace

display "Regression results saved to `output_file'"

Note the elimination of the local macro row and its subsequent updating--it serves no purpose in this approach. Note also the elimination of all the -replace- commands following the regression, their function being taken over by the -frame post- command.

Comment

Muhammad Khan

Join Date: Jan 2025

Posts: 4
#3

05 Jan 2025, 19:40

Thanks for the reply. I tried it and the loop worked just fine, however, there is a new error message when trying to save the results to the excel file:

Code:

. * Save the results to an Excel file . frame results: export excel file_name coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_ > pvalue using `output_file', firstrow(variables) replace invalid 'Data' r(198);

Last edited by Muhammad Khan; 05 Jan 2025, 19:53.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29691
#4

05 Jan 2025, 21:06

Ah,, yes, I didn't notice that little problem. The problem is that your output file's full pathname contains spaces, and so it must be wrapped in quotes.

Code:

frame results: export excel file_name coef_1 se_1 coef_2 se_2 coef_3 se_3 underid_pvalue weakid_fstat sargan_pvalue using `"`output_file'"', firstrow(variables) replace
Comment
Muhammad Khan

Join Date: Jan 2025

Posts: 4
#5

06 Jan 2025, 02:06

Thanks for the troubleshooting. After running the code, I see that the code was executed correctly and the output was stored in the destination excel file, but the post-estimation test results from ivreg2 (underid_p, sargan_p, and weak_Fstat) are missing. What can be responsible for that?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29691
#6

06 Jan 2025, 11:36

I don't know. You never showed the regression command itself, and it may be that you have omitted some options needed to get those results. Even that is just a speculation on my part as I do not use and do not know anything about -ivreg2-. But my first suggestion would be to run the regression command on just one of the data sets (not in a loop) and immediately follow it with -ereturn list-. You may find that those statistics simply aren't calculated, or perhaps their names are different from what you used in your code.
1 like
Comment
Muhammad Khan

Join Date: Jan 2025

Posts: 4
#7

06 Jan 2025, 19:19

thanks, all problems solved.
Comment

Announcement

"Variable Not Found" Error in Looped Stata Code

Comment

Comment

Comment

Comment

Comment

Comment