Several hundred regressions and Stata memory

Farogat WIUT

Join Date: Feb 2018

Posts: 37
#1

Several hundred regressions and Stata memory

03 Feb 2022, 12:24

Hello dear Stata users,

I am trying to run several regressions in the loop (Stata 17), but the memory does not allow me to do so. I tries "set maxvar 10000, permanently" before running my codes in do-file.

forval i=1/300 {
use sortedUzb.dta, clear
bsample 1477
qui probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
estimates store Uzb`i'
qui probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital gen_trust
estimates store UzbT`i'

}

How can I deal with this? I might need more regressions within the loop, meaning there should be around 1000 probit regressions as well as estimated coefficients.

Thank you,
Farogat.
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

03 Feb 2022, 12:39

How do you know the memory is the problem?
Comment
Farogat WIUT

Join Date: Feb 2018

Posts: 37
#3

03 Feb 2022, 14:31

Dear Jared,

Kindly find the image attached. It has reported a r(1000) error message
Attached Files
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#4

03 Feb 2022, 14:40

Hi Farogat,

You have likely bumped into the limit of how many models may be stored by -estimate- (see -help limits-). The limit is 300 stored estimates in any version of Stata, which looks correct since Stata terminates after 150 paired model estimates.

In any case, you will need to re-configure how you get to store those estimates. One way to do it is to -post- your estimates of interest from each model (See -help postfile- or from Stata 16 or newer, -frame post-) if you are not interested in retaining the full model objects. This way you don't need to store the estimated models and work with them directly, or if you do, you need only keep the two for that iteration of the loop and drop them using -estimates drop-. For that matter, -simulate- may be of interest. The -help simulate- gives some examples in that direction.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

03 Feb 2022, 14:41

Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It is particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ. In general screen shots are not considered useful.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

I am trying to run several regressions in the loop (Stata 17), but the memory does not allow me to do so

Section 12.1 of the FAQ is particularly pertinent

12.1 What to say about your commands and your problem

Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!

I started writing before you wrote post #3. Having seen that, if you were to scroll back in your results window, you would see that the list of models you show was preceded by

Code:

system limit exceeded you need to drop one or more models

And if you clicked on "r(1000)" you would see that it is described as "system limit exceeded -- see limits." and if you clicked on "limits" you would read

Code:

Maximum size limits Stata/MP and Stata/BE Stata/SE estimates store # of stored estimation results 300 300

which is the source of your problem - there is a limit (that is not well documented) of 300 on the number of models that can be retained by estimates store, and your loop runs 300 times storing two estimates each time through the loop - a total of 600 models. Stata stopped after 150 times through the loop - 300 estimates.

Note that this has nothing to do with memory, This is why the FAQ tells you to say exactly what Stata typed. We need to see what Stata told you, not what you think Stata meant by what it told you.

You can use estimates save and estimates use (rather than estimates store and estimates restore) to instead retain an arbitrary number of estimates as files on disk.

Added in edit: Crossed with post #4 from Leonardo Guizzetti who gives an excellent alternative to creating countless sets of store estimates.

Last edited by William Lisowski; 03 Feb 2022, 14:44.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29792
#6

03 Feb 2022, 14:42

I think you are not running out of memory. Rather, Stata has a limit of 300 stored estimation results. See -help limits-.

You cannot get around that limit. But I also cannot help thinking that you cannot actually make use of 600, let alone 1,000, complete sets of estimation results. It's too large to be put into a table and reviewed by human eyes. So I imagine that ultimately you plan to somehow summarize or condense all this information in some way to produce some human-usable tables or graphs.* To that end, it probably makes more sense to use either a frame (version 16 or later) or tempfile (earlier Stata versions) to post the important statistics you actually need from each regression into a new data set instead of using -estimates store-, and work with that afterward.

*Since you are sampling with replacement, perhaps you are actually trying to bootstrap these regressions. If so, see the -bootstrap- command which will manage the sampling, iteration, and compilation of results all for you in one fairly simple command. -help bootstrap-
1 like
Comment
Farogat WIUT

Join Date: Feb 2018

Posts: 37
#7

03 Feb 2022, 15:11

Originally posted by Clyde Schechter View Post

I think you are not running out of memory. Rather, Stata has a limit of 300 stored estimation results. See -help limits-.

You cannot get around that limit. But I also cannot help thinking that you cannot actually make use of 600, let alone 1,000, complete sets of estimation results. It's too large to be put into a table and reviewed by human eyes. So I imagine that ultimately you plan to somehow summarize or condense all this information in some way to produce some human-usable tables or graphs.* To that end, it probably makes more sense to use either a frame (version 16 or later) or tempfile (earlier Stata versions) to post the important statistics you actually need from each regression into a new data set instead of using -estimates store-, and work with that afterward.

*Since you are sampling with replacement, perhaps you are actually trying to bootstrap these regressions. If so, see the -bootstrap- command which will manage the sampling, iteration, and compilation of results all for you in one fairly simple command. -help bootstrap-

Yes Clyde, I want to do this:
1. Create a sample of 1477 observations with replacement from a base country and store coefficients (dataset for base country is of the size 1477)
2. Create another sample of 1477 observations from a comparison country, apply stored coefficients (comparison country dataset has fewer observations, my supervisor therefore advised to resample with the same sample size as the base country)
3. Store yhats, average them over 1477 observations
4. Find Covariate effect: base rate (fixed, I created a loop) - (3)
5. Find Coefficient effects: (3) - rate in comparison country

And repeat that 200 times, find 95% confidence level for (4) and (5).
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#8

03 Feb 2022, 15:16

Clyde Schechter Do you see any benefit between using tempfiles vs. datagrams? I am sure they solve the problem here, but I'm almost certain that frames are better likely for reasons under the hood.

I agree, however, you don't want to run your model 300 times literally, what you likely want is to bootstrap 300 times Farogat WIUT
Comment
Farogat WIUT

Join Date: Feb 2018

Posts: 37
#9

03 Feb 2022, 15:26

Originally posted by Jared Greathouse View Post

I agree, however, you don't want to run your model 300 times literally, what you likely want is to bootstrap 300 times Farogat WIUT

But does bootstrap work if I want to resample 200 times and each time run probit regression?
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#10

03 Feb 2022, 15:27

Originally posted by Jared Greathouse View Post

Clyde Schechter Do you see any benefit between using tempfiles vs. datagrams? I am sure they solve the problem here, but I'm almost certain that frames are better likely for reasons under the hood.

I agree, however, you don't want to run your model 300 times literally, what you likely want is to bootstrap 300 times Farogat WIUT

Frames should outperform postfile simply because the frames are held in memory. But, the relative speed gain may be very little on modern hardware, such as SSD drives.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29792
#11

03 Feb 2022, 15:31

Jared Greathouse Sorry, I don't understand datagram in this context. As between tempfiles and frames, frames will clearly be more efficient as they will avoid thrashing the disk, if they are available to O.P.

Farogat WIUT I'm not clear on exactly what you want to do here. The regressions you show in #1 are not conditioned on data from different countries. They differ in the inclusion or exclusion of a variable gen_trust. And I don't understand parts 4. and 5. of your explanation in #7. It does seem like you want to do more than just get bootstrapped estimates of the regression models shown in #1, but I can't tell just what exactly you need.

Added: Crossed with #9 and 10. And agree with #10 that the gain in efficiency on an SSD would be barely detectable. But the gain could be noticeable if, for example, the program is being run over a network with the tempfile on a remote server.

Last edited by Clyde Schechter; 03 Feb 2022, 15:33.
1 like
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#12

03 Feb 2022, 15:34

Clyde Schechter my apologies, I meant to say dataframe
Comment
Farogat WIUT

Join Date: Feb 2018

Posts: 37
#13

03 Feb 2022, 15:50

Originally posted by Clyde Schechter View Post

Farogat WIUT I'm not clear on exactly what you want to do here. The regressions you show in #1 are not conditioned on data from different countries. They differ in the inclusion or exclusion of a variable gen_trust. And I don't understand parts 4. and 5. of your explanation in #7. It does seem like you want to do more than just get bootstrapped estimates of the regression models shown in #1, but I can't tell just what exactly you need.

@Clyde Schechter
Base country - Uzb, comparison country - Kaz. I think the code would be clearer than the words, basically I am struggling just because I have to repeat the process 300 times. If it's doable with bootstrap, then I would definitely do it.

********************Counterfactual, COVAR & COEFFICIENT effects***************************
forval i=1/300 {
use sortedUzb.dta, clear
bsample 1477
qui probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
estimates store Uzb`i'
qui probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital gen_trust
estimates store UzbT`i'

}

quietly des using Kaz.dta
local N =971
frame create results
frame results {
set obs 300
gen results=.
}

frame create results1
frame results1 {
set obs 300
gen results1=.
}

local base_rate 0.9824
local compar_rate=0.898

forval i=1(1)300 {
clear
set obs 1477

gen obs_id = runiformint(1, 1477)
merge m:1 obs_id using Kaz.dta, keep(match) nogenerate
estimates restore Uzb`i'
predict yhat
qui sum yhat
frame results: replace results=`base_rate'-r(mean) in `i'
frame results1: replace results1= r(mean) - `compar_rate' in `i'

}

frame results: list in 1/300, sep(0)
frame results1: list in 1/300, sep(0)

frame results: ci means results
frame results1: ci mean results1
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2408
#14

03 Feb 2022, 15:55

Hi Farogat
I think you are trying to do something that may not be necessary.
SO let me reformulate the question: How would you do this if you only wanted to do 10 iterations.
If you can show me that, I may be able to help.
Comment
Farogat WIUT

Join Date: Feb 2018

Posts: 37
#15

03 Feb 2022, 16:36

Originally posted by FernandoRios View Post

Hi Farogat
I think you are trying to do something that may not be necessary.
SO let me reformulate the question: How would you do this if you only wanted to do 10 iterations.
If you can show me that, I may be able to help.

Hi Fernando, instead of 300, I would have 10 in loops
Comment

Announcement

Several hundred regressions and Stata memory

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment