Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • parallel bootstraping

    Is it said that George G. Vega Yon/Brian Quistorff's parallel can bootstrap, and while I can reproduce examples provided with parallel ( https://github.com/gvegayon/parallel ) on Stata 15.1; I'm having troubles using it with some functions, for instance, summarize; see below:

    Code:
    . set obs 100
    number of observations (_N) was 0, now 100
    
    . gen n=_n
    
    . parallel setclusters 4
    N Clusters: 4
    Stata dir:  /opt/stata15/stata-mp
    
    . bs r=r(mean): sum n
    (running summarize on estimation sample)
    
    Warning:  Because summarize is not an estimation command or does not set e(sample), bootstrap has no way to
              determine which observations are used in calculating the statistics and so assumes that all
              observations are used.  This means that no observations will be excluded from the resampling
              because of missing values or other reasons.
    
              If the assumption is not true, press Break, save the data, and drop the observations that are to
              be excluded.  Be sure that the dataset in memory contains only the relevant data.
    
    Bootstrap replications (50)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    ..................................................    50
    
    Bootstrap results                               Number of obs     =        100
                                                    Replications      =         50
    
          command:  summarize n
                r:  r(mean)
    
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               r |       50.5   3.193037    15.82   0.000     44.24176    56.75824
    ------------------------------------------------------------------------------
    
    . parallel bs r=r(mean): sum n
    --------------------------------------------------------------------------------
    Exporting the following program(s): sum
    program sum not found
    An error has occurred while exporting -programs-invalid syntax
    r(111);
    
    .

  • #2
    That is a problem with bootstrap rather than parallel. bootstrap needs to know which observations are to be excluded (e.g. because of missing values), and does so using e(sample). summarize does not return e(sample), that would not make sense given what summarize does. So that is what the warning message is about.

    You can do what the warning says, and that is probably the easiest solution. Alternatively you can wrap summarize for one variable in a eclass program, and also return e(sample).

    Code:
    clear all
    sysuse auto
    
    program toboot, eclass
        syntax varname [if] [in]
        marksample touse
        sum `varlist' if `touse'
        ereturn post, esample(`touse')
        ereturn scalar m = r(mean)
    end
    
    bootstrap m=e(m) : toboot weight
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks Maarten, this is the output, looks like it got a step further; with this new twist:

      (running toboot on estimation sample)
      expression list required
      r(100);
      Code:
      . parallel bootstrap m=e(m) : toboot n
      --------------------------------------------------------------------------------
      Exporting the following program(s): toboot
      
      toboot, eclass:
        1.     syntax varname [if] [in]
        2.     marksample touse
        3.     sum `varlist' if `touse'
        4.     ereturn post, esample(`touse')
        5.     ereturn scalar m = r(mean)
      --------------------------------------------------------------------------------
      --------------------------------------------------------------------------------
      Parallel Computing with Stata
      Clusters   : 2
      pll_id     : s2mhrpb315
      Running at : /home/jfalken
      Randtype   : datetime
      
      Waiting for the clusters to finish...
      cluster 0001 Exited with error -100- while running the command/dofile (view log)...
      cluster 0002 Exited with error -100- while running the command/dofile (view log)...
      --------------------------------------------------------------------------------
      Enter -parallel printlog #- to checkout logfiles.
      --------------------------------------------------------------------------------
      2 child processes encountered errors. Throwing last error.
      r(100);
      
      . parallel printlog 1
      --------------------------------------------------------------------------------
                    beginning of file -/tmp/__plls2mhrpb315_do0001.log-              
      --------------------------------------------------------------------------------
      
      
      . capture {
      clear
      set processors 1
      cd "/home/jfalken/"
      sysdir set PERSONAL "~/ado/personal/"
      sysdir set PLUS "~/ado/plus/"
      global S_ADO = `"BASE;SITE;.;PERSONAL;PLUS;OLDPLACE"'
      mata: mata mlib index
      mata: mata set matalibs "lmatabase;lmatasem;lmatamixlog;lmatapss;lmatasp;lmat
      > asvy;lmataado;lmatapath;lmatagsem;lmatafc;lmatatab;lmataerm;lmatamcmc;lmataop
      > t;lmatapostest;lparallel;l__plls2mhrpb315_mlib"
      set seed 18916
      noi di "{hline 80}"
      -------------------------------------------------------------------------------
      > -
      noi di "Parallel computing with Stata"
      Parallel computing with Stata
      noi di "{hline 80}"
      -------------------------------------------------------------------------------
      > -
      noi di `"cmd/dofile   : "__plls2mhrpb315_bs_simul.do""'
      cmd/dofile   : "__plls2mhrpb315_bs_simul.do"
      noi di "pll_id       : s2mhrpb315"
      pll_id       : s2mhrpb315
      noi di "pll_instance : 1/2"
      pll_instance : 1/2
      noi di "tmpdir       : `c(tmpdir)'"
      tmpdir       : /tmp/__plls2mhrpb315_tmpdir0001
      noi di "date-time    : `c(current_time)' `c(current_date)'"
      date-time    : 09:59:17  5 Jun 2018
      noi di "seed         : `c(seed)'"
      seed         : XAA00000000000049e4079e632ed66919158ffc16e8a923d3b37a60719e43560
      > 52042df919d4adf45d1abea03084ef17595ac35e09ceeaa949166331d37b1400ade27b1666c75
      > 718a3bad5d67d1f0df9168710f8f535b2225acb70f3d553f6272747f46c773a1c4a8caf7651dd
      > 5817560bc4c8d695dffcbc2a198b93cebd37e962f0674aa8bbdf1b8f9a96a762ae32e0ad6393b
      > e88503db1356c49882a9252810318267e714ad2da6de7a053e5384d978c1f4f57fc284e179d6f
      > d59aaf55c0c1588a85834a7fab6be871d9f05351a0c787a69cc94ba1af7b38c035ba759fcae82
      > e4b71841addc501e30e420c63de42a1ce7e0f23e3c807030327ff53091d425c7bc28ded36365b
      > bdf597b278b3d80a069c6f1fdbd5ed57f511da5503f0acf152469a06ea28eda0f7bf342147275
      > b27a5882572f20be79a98c2dc5d5f8de24f377e334444f0c8ca43288c5563b73f7165a11ebf96
      > ae04a16c354701d31b6d68b7f7650918e866f047117540586bb11afcb0bce8731027672a6f014
      > d6c5340268e5e6a2e7a4128d5b839ee4c34f25292c869dc892c99c95920bf878dedf683bb4a05
      > 2b338a5420e985bcb5195421c3e4dd98a03fabb33748232a5a2d810a9880bc2385ad920fe032d
      > ea6a92ad7c5a8aa44a08eda7ca0def6df9fc6b376d47dac5730e7dd853e0d484aadf3001fcea8
      > 74c2b435b4fa14ca3d8d8056489e5cc4a9bf4f3eebe2df3e4c81cca9d4f46b7ea71c69619c2de
      > d32ba75446788c8b7d83f68a91e31a6478e6e3cb9469435c5ceaedb2d089d5450d57231e03ce7
      > 720187fa3c57442a9d67cf102bf3392110aacda6f6a13a6cdb0127319144795f8de15f49fcb6c
      > 161c59f3550854a2e5ddc6686ebd3ec7fc2d2a87117b44c23f1b4a7f8a1bd53cc11aaefc01993
      > fdc0e05fa72af6adb84c2e623bfcc831286a24f66563ca33cb41a10aa4401bdf652a09b87ec41
      > 50fabf9bb05eb9b3a0d68f5ac92dd919c6aa20f4a319e2e78747258dedc24880315c07ea65534
      > 4103ca83b4844ce1b7ab55a331a6ea1d2e0b9a9265d0eed8256aa960b6e32667e283d56209c64
      > d3356db7832cd103dfa9551be176334547b6bfd8acf286924f4ab7bc74ec4cf84a8580a8a039f
      > e94cfd925b7f23c460373ef952dd678fd9db2bdb6e1baab45d8ed66a50f27ef72ad15706dff17
      > 3e5c156dc71e0a577c9ccb26de70f96463ab341326cd17c478672bdc2c6aa80fded6ae0b72903
      > 56ba01b51eb60e3a8c915f01e6a9905a5c420d91ffc8fc6cfd797d97518e3779f51758bd1edfb
      > 8218dc305309116134b5833d7c6a196381eed6005c01edab2dbb1da2122a6f228177131d706f5
      > b5e904abf9053620611fd919bf5bc6ec7e503fa765df205c6d90d0af02d1b4388612ad90a911a
      > e79c4966c9caf99a32a05f3c1735425dcfa91e11bf1cbee498cf6e5d05cf12bc60ae63cca3d57
      > d89d17c2036a188d461e59305e9d53820db927e9b58883d506deb2f517badf51bc2603c436a4d
      > fc17a5e86f5407673c0aa81663c1d49585effc453efadaaa1e3125d741573ffc1d53b98f1f1d0
      > 72c5521d4a187e4326654b2624d65ee7705b2a96dbdb6d99b69c01531d7c5708384b0129b3e91
      > afc425e1b997c821232b7b736ffd3369086d675dbaa58fcf42fb5741171eb74f4294d8dc442a0
      > f821c22f87488bece0d679aa3f47dca1e75438707fbe96d77636cc9925eb9cbb23ec6cccfeef0
      > 1d2c353a5f868e50f01e20471fd0b4aad380451973ac995078b8666cd6b62133a6ca874c4295a
      > 4673ed2f5179ba5015f13fd288237fa5f5229d08b18df06700a606ebbc79388a86394197606d3
      > 9837dca710e46021aef4a914465f21e2a8d7b3747b1b2c49deb81cffa2ddd13e5dc0bfe2a2fe8
      > d768425b3104a82b3935cf09812f396e92b29b361aa33aff979c6a7b713c68b7e4d77467ce2fe
      > 40c6a01ae878553232a451a6e34cf12bd58a4cea2837c7e70ed4855eddd3092a11265bc72e407
      > 5d1b92edd34e1f1b942b704df11c7e208165594dda843139959c2511de9380b0885790e3e0851
      > 4a6393bd424fdc00b8300f6a0e85b4e83c2b9a62ab3aee974484dfd30ff38f53ad07f7d4a8939
      > 0877db314c3279b5fd4b4a3d0b1f5b04caeba7488a17e593fff1509dde9f9d66ec342e89cfd54
      > e1c011599778e2acc8c0b39218726dbe25c40111064a2e8b8ae09fbf21c10d2986e6d656cdc29
      > b5382f657e2e000fdab2cdb1252a78f138645bd6803b26e197e0d78c7f4900554beb39c310639
      > bbd6c9c6199a7d531f0d7b8ce52bc345424d1784e7efe59f99e95bbc7b47e087dcdf7338dec18
      > 7fa4f73f29f92002f50d2ee92a45bdc8be796ea82cae03b12d05057ab539cd1ce618ce53076f6
      > 94dbcc2fb89098125125d7d79b4a54981e8234f1e24d62f45c8240767a3c0e82c1b063544b79b
      > 317dbf45f9f247e12e935e9d5d9cd1044c492a15a78ae650b96255472d9adec7c62bbcc795500
      > 8802d6b6e593696ca9272d42f16a2ca21a729a52953e57b5dab43bf0e34e63dd290328f5c7a42
      > e2462b45e6eaf6dfbb90f5432484b82204e32c9c56be5ac922091c9ae917fe779f972d79d96de
      > 497642827f2c28bfae33d039618f6c4dd434c400caec3bcb4a0038350896bb11703111a8e99bd
      > 2ba077667b588c5d0ca6436da63aaebea6ec2c55bbb7b604770c95bb1a02d4d9d258b5cf02336
      > b87a9dbad1660fb5cd6fe34b7ea849c9924a7e0d71148e11971a45be6a9292d3c0da14d77e68b
      > 75be540d2fc60833ba81914b22dd449bb94b173132f199485c956a84e2f2551837f2048ae3546
      > c2225201198f5addff144a1a701de9bc46ebb48f99e76b217c5a3c4a2a100c815469ebccd4faf
      > 05d27fd3e2b76f071f744aedd1576b660c62951ef2da82be59f43f009e56663f49f61e3a0fc9f
      > e2a1d762ed79adba9092d68c4682123aaec081012f1e6ba541700933b08751cba04594f385ae1
      > aab54005e8232ca2849f9be2709657fb488ce522d69be315540e6dab88b6651a202a6d4969a91
      > 3ed3530db81478d0607f8c83069e4b205034b2bb2b7f3712c8c530eeebd2d7094075ace6de317
      > fad52be0052f6d0406f08b33e4f5eb6130345a5a86f407a21e21cb32e6a10ff2bdcb7cd3ee4d6
      > ce7c306bfb2b2afa67ee49e06cee4c0863af5ba55a15fb5d265f3810d46338da8d2b4bfebbb91
      > f1b7087ddf5206f0728e17bb0e0dafe572f2401e7074edf68bd9d02bf5139de66cb3541351cb3
      > b4b3dba360bf9caa141061ca60b13ce9f57872294fb091542e67e3008dd5982ec3115ecccc55d
      > 478394ee627395dcc8b45087970a2737f229915b2ab17788ba9d5abda504576553666dd729756
      > 4cbece382fea143cf032e4b340a3f5c293a120c0f2e8139c6a1e28fb5c9e9c97cb243351e0186
      > 29092932e83e8895be9cb98a548f7ef115a7106d85f411671d71494b47dcb77b0bf782948d2e1
      > 39ea7bf79eca3293563d14d74c30d20b53f1c32bd499e2f7b827d75cd7d2f074ae8ee44a920cc
      > ff50001000001383575
      noi di "{hline 80}"
      -------------------------------------------------------------------------------
      > -
      local pll_instance 1
      local pll_id s2mhrpb315
      global pll_instance 1
      global pll_id s2mhrpb315
      mata: for(i=1;i<=1;i++) PLL_QUIET = st_tempname()
      }
      
      . local result = _rc
      
      . if (c(rc)) {
      . cd "/home/jfalken/"
      . mata: parallel_write_diagnosis(strofreal(c("rc")),"/home/jfalken/__plls2mhrpb
      > 315_finito0001","while setting memory")
      . clear
      . exit
      . }
      
      .
      . * Loading Programs *
      
      . capture {
      run "/home/jfalken/__plls2mhrpb315_prog.do"
      }
      
      . local result = _rc
      
      . if (c(rc)) {
      . cd "/home/jfalken/"
      . mata: parallel_write_diagnosis(strofreal(c("rc")),"/home/jfalken/__plls2mhrpb
      > 315_finito0001","while loading programs")
      . clear
      . exit
      . }
      
      .
      . * Checking for break *
      
      . mata: parallel_break()
      
      .
      . * Loading Globals *
      
      . capture {
      cap run "/home/jfalken/__plls2mhrpb315_glob.do"
      }
      
      . if (c(rc)) {
      .   cd "/home/jfalken/"
      .   mata: parallel_write_diagnosis(strofreal(c("rc")),"/home/jfalken/__plls2mhr
      > pb315_finito0001","while loading globals")
      .   clear
      .   exit
      . }
      
      .
      . * Checking for break *
      
      . mata: parallel_break()
      
      . capture {
        noisily {
      .
      . * Checking for break *
      . mata: parallel_break()
      .     use __plls2mhrpb315_bs_dta.dta, clear
      .     if (`pll_instance'==$PLL_CLUSTERS) local reps = 25
      .     else local reps = 25
      .     local pll_instance : di %04.0f `pll_instance'
      .     bs , sav(__pll`pll_id'_bs_eststore`pll_instance', replace  )  rep(`reps')
      > : toboot n
      (running toboot on estimation sample)
      expression list required
      r(100);
      .   }
      }
      
      . if (c(rc)) {
      .   cd "/home/jfalken/"
      /home/jfalken
      .   mata: parallel_write_diagnosis(strofreal(c("rc")),"/home/jfalken/__plls2mhr
      > pb315_finito0001","while running the command/dofile")
      .   clear
      .   exit
      --------------------------------------------------------------------------------
                       end of file -/tmp/__plls2mhrpb315_do0001.log-                  
      --------------------------------------------------------------------------------
      
      .

      Comment


      • #4
        Hi Jerome,
        Did you ever solved your problem? I get a similar error msg for parallel do and not quite sure why..

        Comment


        • #5
          I haven't...

          Comment


          • #6
            I think it works in the current version:


            Code:
            sysuse auto, clear
            version 16.1
            parallel initialize 4
            
            cap program drop mwrapper
            program define mwrapper, rclass
                syntax varlist [if] [in]
                marksample touse
                summarize `varlist' if `touse', meanonly
                return scalar m = r(mean)
            end
            
            
            parallel bs, expression(r(m)) reps(400): mwrapper price
            Best wishes

            (Stata 16.1 MP)

            Comment


            • #7
              Over the last 2-3 years I tried a couple of times to run Stata simulations through -parallel sim- command, and it never worked on at least 3 different laptops and at least 3 versions of Stata.

              Most recently, I downloaded a test program from parallel Gethub, and the test programme generated the same error message as my own programmes generate.

              I wrote to the authors of parallel with the detailed error messages and the details of my system, they were very kind to reply to me, however they advised on some complicated procedure where I need to go to their GitHub and do some elaborate things there...

              My conclusion was that it is easier and cheaper for me rather than play along with the authors, to just figure out how to do this manually myself.

              So I figured out how to do this manually myself, and I typed up a note for Stata Journal. You can find the note attached, the procedure I am explaining is what worked for me ultimately.

              Attached Files

              Comment


              • #8
                Joro Kolev, have you tried my simulate2/psimulate2 package? It parallelises the simulate command and is available on SSC or here: https://github.com/JanDitzen/simulate2 respectively https://janditzen.github.io/simulate2 in Stata.
                Let me know if that works or if you require any help.
                Jan

                Comment


                • #9
                  JanDitzen and are there any namespace collisions that one needs to be cautious of when using multiple instances of Stata? For example, temp files or temp names, frames? In other words, does one instance of Stata play nicely in a sandbox with other instances?

                  Comment


                  • #10
                    The only problem I am aware of is if one saves temporary files created with tempname. The local containing the name of the tempfile is different when using parallel instances. However if you use tempname within a program that is called by psimulate2, then the name might be the same and there are collisions possible.

                    Comment


                    • #11
                      Thanks very much.

                      Comment


                      • #12
                        You are welcome and please let me know if there is anything I can help with!

                        Comment

                        Working...
                        X