Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Save command with modify option

    Hello,

    I have data set and I want to do a simulation for 1,000,000 times. For each iteration, I randomly pick one observation out of that data set and save that single observation (with many variables) in new data set for later use. However, if I use "save" command in Stata format, that does not have "modify" option, which add other row to save the new observation that I pick. The only option is replace, which I do not want to because I want to keep all 1,000,000 observations from 1,000,000 iterations that I run.

    Do you have any suggestion on how to save in a Stata format with "modify" option? or any command that can solve this problem.

    P/S: I tried to do "putexcel", however, I have to do more iterations at one time, and it may exceed the number of rows in excel.

    Thank you!
    Last edited by Davis Nguyen; 26 Jun 2023, 10:32.

  • #2
    Look into "append", which add case(s) to an existing data set.

    Comment


    • #3
      I think about "append", but it does not work. When I do a loop with 1,000,000 iterations. I want my first iteration to be saved into a new file, and then for subsequent iteration, it keep modifying my new data set. I don't want to save 1,000,000 files separately and append them.

      That is what I think. I don't know if we have any other way to use "append".

      Comment


      • #4
        You need something like this pseudo-code:

        Code:
        clear
        save kept_obs, replace emptyok
        
        forval i = 1/1000000 {
            ... do stuff ...
            local keep_obs_num = runiformint(1, _N)
            keep in `keep_obs_num'
            append using kept_obs
            save kept_obs, replace
        }
        Last edited by Hemanshu Kumar; 26 Jun 2023, 11:07.

        Comment


        • #5
          It works . Thank you very much!

          Comment


          • #6
            Hemanshu Kumar has provided a working answer to your request. However, this is likely to be fairly inefficient because of the constant need to read and write datasets to disk. I wonder if you can’t use -frame post- to build your dataset as your loop executes and then only save it out at the end. I can’t tell if that’s possible from the description given above, so I think it best to see some code if you want to consider that approach.
            Last edited by Leonardo Guizzetti; 26 Jun 2023, 12:41.

            Comment


            • #7
              On a second read of OP's post, there is a more direct approach that doesn't need a loop. This is essentially a simple random sampling issue. Here's an outline of what to do, with an optional expansion of the sampled data at the end.

              Code:
              clear *
              cls
              set seed 18
              
              tempfile data
              mkf Data
              cwf Data
              sysuse auto, clear
              gen `c(obs_t)' row = _n // assign a row number to exists data
              local Nobs = _N
              save `data', replace
              
              mkf Sample
              cwf Sample
              set obs 1000000
              gen long row = runiformint(1, `Nobs')
              contract row
              merge m:1 row using `data', keep(match) nogen noreport
              /*
              * optional step: if you like, you can expand the data if you really need 1,000,000 observations, say.
              * The variable _freq can also be used a frequency weights, if applicable.
              expand _freq
              drop _freq
              */
              
              * save out your sample dataset
              save /path/for/my_sample.dta

              Comment

              Working...
              X