Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Several hundred regressions and Stata memory

    Hello dear Stata users,

    I am trying to run several regressions in the loop (Stata 17), but the memory does not allow me to do so. I tries "set maxvar 10000, permanently" before running my codes in do-file.

    forval i=1/300 {
    use sortedUzb.dta, clear
    bsample 1477
    qui probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
    estimates store Uzb`i'
    qui probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital gen_trust
    estimates store UzbT`i'

    }

    How can I deal with this? I might need more regressions within the loop, meaning there should be around 1000 probit regressions as well as estimated coefficients.

    Thank you,
    Farogat.


  • #2
    How do you know the memory is the problem?

    Comment


    • #3
      Dear Jared,

      Kindly find the image attached. It has reported a r(1000) error message
      Attached Files

      Comment


      • #4
        Hi Farogat,

        You have likely bumped into the limit of how many models may be stored by -estimate- (see -help limits-). The limit is 300 stored estimates in any version of Stata, which looks correct since Stata terminates after 150 paired model estimates.

        In any case, you will need to re-configure how you get to store those estimates. One way to do it is to -post- your estimates of interest from each model (See -help postfile- or from Stata 16 or newer, -frame post-) if you are not interested in retaining the full model objects. This way you don't need to store the estimated models and work with them directly, or if you do, you need only keep the two for that iteration of the loop and drop them using -estimates drop-. For that matter, -simulate- may be of interest. The -help simulate- gives some examples in that direction.

        Comment


        • #5
          Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It is particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ. In general screen shots are not considered useful.

          The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

          I am trying to run several regressions in the loop (Stata 17), but the memory does not allow me to do so
          Section 12.1 of the FAQ is particularly pertinent

          12.1 What to say about your commands and your problem

          Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
          I started writing before you wrote post #3. Having seen that, if you were to scroll back in your results window, you would see that the list of models you show was preceded by
          Code:
          system limit exceeded
          you need to drop one or more models
          And if you clicked on "r(1000)" you would see that it is described as "system limit exceeded -- see limits." and if you clicked on "limits" you would read
          Code:
          Maximum size limits
          
                                                                        Stata/MP and
                                                       Stata/BE         Stata/SE
             estimates store
                 # of stored estimation results           300                300
          which is the source of your problem - there is a limit (that is not well documented) of 300 on the number of models that can be retained by estimates store, and your loop runs 300 times storing two estimates each time through the loop - a total of 600 models. Stata stopped after 150 times through the loop - 300 estimates.

          Note that this has nothing to do with memory, This is why the FAQ tells you to say exactly what Stata typed. We need to see what Stata told you, not what you think Stata meant by what it told you.

          You can use estimates save and estimates use (rather than estimates store and estimates restore) to instead retain an arbitrary number of estimates as files on disk.

          Added in edit: Crossed with post #4 from Leonardo Guizzetti who gives an excellent alternative to creating countless sets of store estimates.
          Last edited by William Lisowski; 03 Feb 2022, 14:44.

          Comment


          • #6
            I think you are not running out of memory. Rather, Stata has a limit of 300 stored estimation results. See -help limits-.

            You cannot get around that limit. But I also cannot help thinking that you cannot actually make use of 600, let alone 1,000, complete sets of estimation results. It's too large to be put into a table and reviewed by human eyes. So I imagine that ultimately you plan to somehow summarize or condense all this information in some way to produce some human-usable tables or graphs.* To that end, it probably makes more sense to use either a frame (version 16 or later) or tempfile (earlier Stata versions) to post the important statistics you actually need from each regression into a new data set instead of using -estimates store-, and work with that afterward.

            *Since you are sampling with replacement, perhaps you are actually trying to bootstrap these regressions. If so, see the -bootstrap- command which will manage the sampling, iteration, and compilation of results all for you in one fairly simple command. -help bootstrap-

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              I think you are not running out of memory. Rather, Stata has a limit of 300 stored estimation results. See -help limits-.

              You cannot get around that limit. But I also cannot help thinking that you cannot actually make use of 600, let alone 1,000, complete sets of estimation results. It's too large to be put into a table and reviewed by human eyes. So I imagine that ultimately you plan to somehow summarize or condense all this information in some way to produce some human-usable tables or graphs.* To that end, it probably makes more sense to use either a frame (version 16 or later) or tempfile (earlier Stata versions) to post the important statistics you actually need from each regression into a new data set instead of using -estimates store-, and work with that afterward.

              *Since you are sampling with replacement, perhaps you are actually trying to bootstrap these regressions. If so, see the -bootstrap- command which will manage the sampling, iteration, and compilation of results all for you in one fairly simple command. -help bootstrap-
              Yes Clyde, I want to do this:
              1. Create a sample of 1477 observations with replacement from a base country and store coefficients (dataset for base country is of the size 1477)
              2. Create another sample of 1477 observations from a comparison country, apply stored coefficients (comparison country dataset has fewer observations, my supervisor therefore advised to resample with the same sample size as the base country)
              3. Store yhats, average them over 1477 observations
              4. Find Covariate effect: base rate (fixed, I created a loop) - (3)
              5. Find Coefficient effects: (3) - rate in comparison country

              And repeat that 200 times, find 95% confidence level for (4) and (5).

              Comment


              • #8
                Clyde Schechter Do you see any benefit between using tempfiles vs. datagrams? I am sure they solve the problem here, but I'm almost certain that frames are better likely for reasons under the hood.

                I agree, however, you don't want to run your model 300 times literally, what you likely want is to bootstrap 300 times Farogat WIUT

                Comment


                • #9
                  Originally posted by Jared Greathouse View Post

                  I agree, however, you don't want to run your model 300 times literally, what you likely want is to bootstrap 300 times Farogat WIUT
                  But does bootstrap work if I want to resample 200 times and each time run probit regression?

                  Comment


                  • #10
                    Originally posted by Jared Greathouse View Post
                    Clyde Schechter Do you see any benefit between using tempfiles vs. datagrams? I am sure they solve the problem here, but I'm almost certain that frames are better likely for reasons under the hood.

                    I agree, however, you don't want to run your model 300 times literally, what you likely want is to bootstrap 300 times Farogat WIUT
                    Frames should outperform postfile simply because the frames are held in memory. But, the relative speed gain may be very little on modern hardware, such as SSD drives.

                    Comment


                    • #11
                      Jared Greathouse Sorry, I don't understand datagram in this context. As between tempfiles and frames, frames will clearly be more efficient as they will avoid thrashing the disk, if they are available to O.P.

                      Farogat WIUT I'm not clear on exactly what you want to do here. The regressions you show in #1 are not conditioned on data from different countries. They differ in the inclusion or exclusion of a variable gen_trust. And I don't understand parts 4. and 5. of your explanation in #7. It does seem like you want to do more than just get bootstrapped estimates of the regression models shown in #1, but I can't tell just what exactly you need.

                      Added: Crossed with #9 and 10. And agree with #10 that the gain in efficiency on an SSD would be barely detectable. But the gain could be noticeable if, for example, the program is being run over a network with the tempfile on a remote server.
                      Last edited by Clyde Schechter; 03 Feb 2022, 15:33.

                      Comment


                      • #12
                        Clyde Schechter my apologies, I meant to say dataframe

                        Comment


                        • #13
                          Originally posted by Clyde Schechter View Post

                          Farogat WIUT I'm not clear on exactly what you want to do here. The regressions you show in #1 are not conditioned on data from different countries. They differ in the inclusion or exclusion of a variable gen_trust. And I don't understand parts 4. and 5. of your explanation in #7. It does seem like you want to do more than just get bootstrapped estimates of the regression models shown in #1, but I can't tell just what exactly you need.
                          @Clyde Schechter
                          Base country - Uzb, comparison country - Kaz. I think the code would be clearer than the words, basically I am struggling just because I have to repeat the process 300 times. If it's doable with bootstrap, then I would definitely do it.

                          ********************Counterfactual, COVAR & COEFFICIENT effects***************************
                          forval i=1/300 {
                          use sortedUzb.dta, clear
                          bsample 1477
                          qui probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital
                          estimates store Uzb`i'
                          qui probit owns_dwelling male age_group high_ed married divorced separated widowed log_monspending hh_size capital gen_trust
                          estimates store UzbT`i'

                          }


                          quietly des using Kaz.dta
                          local N =971
                          frame create results
                          frame results {
                          set obs 300
                          gen results=.
                          }

                          frame create results1
                          frame results1 {
                          set obs 300
                          gen results1=.
                          }

                          local base_rate 0.9824
                          local compar_rate=0.898

                          forval i=1(1)300 {
                          clear
                          set obs 1477

                          gen obs_id = runiformint(1, 1477)
                          merge m:1 obs_id using Kaz.dta, keep(match) nogenerate
                          estimates restore Uzb`i'
                          predict yhat
                          qui sum yhat
                          frame results: replace results=`base_rate'-r(mean) in `i'
                          frame results1: replace results1= r(mean) - `compar_rate' in `i'

                          }


                          frame results: list in 1/300, sep(0)
                          frame results1: list in 1/300, sep(0)

                          frame results: ci means results
                          frame results1: ci mean results1

                          Comment


                          • #14
                            Hi Farogat
                            I think you are trying to do something that may not be necessary.
                            SO let me reformulate the question: How would you do this if you only wanted to do 10 iterations.
                            If you can show me that, I may be able to help.

                            Comment


                            • #15
                              Originally posted by FernandoRios View Post
                              Hi Farogat
                              I think you are trying to do something that may not be necessary.
                              SO let me reformulate the question: How would you do this if you only wanted to do 10 iterations.
                              If you can show me that, I may be able to help.
                              Hi Fernando, instead of 300, I would have 10 in loops

                              Comment

                              Working...
                              X