Problem with parallel: parallelize a loop

Matthew Alexander

Join Date: Feb 2021

Posts: 58
#16

26 Dec 2021, 10:19

Hi Lucien,
I don't know why this problem is occurring - initialize is a standard parallel command. My last suggestion on this topic is to check that you have latest version of parallel installed.
As for coefficients, Stata certainly does calculate coefficients - your above program saves them to a dataset. And you certainly can graph the coefficients estimates contained within said dataset. I don't know exactly what kind of graph you want, but I suspect that you may be looking for a combination of twoway scatter (for coefficient estimates) and twoway rspike (for CI estimates).
Something like

Code:

graph twoway rspike ub_90 lb_90 i_pos || scatter beta i_pos

All the best,
Matt

Last edited by Matthew Alexander; 26 Dec 2021, 10:29.
Comment

Lucien AHOUANGBE

Join Date: Mar 2019
Posts: 15

#17

26 Dec 2021, 11:14

Matthew, you are right, I just reinstalled the latest version of the program as you suggested.
And a big thank you for the graphics code, it works and I think it's better than what I was doing.
When I run the program now it seems to work without errors. But there is a problem with the output of the results.

Normally with 10 iterations with loop, here is the command and the result I should have.

Code:

preserve
                
                drop beta se i_pos lb_* ub_*
                g i_pos = .
                g beta = .
                g se = .
                g lb_90 = .
                g lb_95 = .
                g lb_99 = .
                g ub_90 = .
                g ub_95 = .
                g ub_99 = .
                
                qui {
                    sum rp_partner_rank, d
                    
                    forvalues i = `r(min)' (1) 10 {
                            
                                xi: reg rp_avg_pc_epi_gap_abs lag_fdi_in_all $contrlsorder i.year i.pccountry i.rpcountry if rp_partner_rank <= `i', vce(cl id)
                                *eststo
                                lincom _b[lag_fdi_in_all], l(95)
                                replace beta = r(estimate) in `i'
                                replace se = r(se) in `i'
                                replace i_pos = `i' in `i'
                                replace lb_95 = r(lb) in `i'
                                replace ub_95 = r(ub) in `i'
                                
                                lincom _b[lag_fdi_in_all], l(90)
                                replace lb_90 = r(lb) in `i'
                                replace ub_90 = r(ub) in `i'
                                
                                lincom _b[lag_fdi_in_all], l(99)
                                replace lb_99 = r(lb) in `i'
                                replace ub_99 = r(ub) in `i'
                            
                    
                    }
                } 
            
                
                *twoway rarea ub_90 lb_90 i_pos , astyle(ci) || ///
                *line beta i_pos
                
                graph twoway rspike ub_90 lb_90 i_pos || scatter beta i_pos

                save "$DataCreated\asup", replace
            restore

Click image for larger version

Name: Graph3.png
Views: 1
Size: 51.7 KB
ID: 1642410

normally I should have the variable i_pos should be length 10.

With the iteration program without preserve with 4 clusters: the length is 40. I have the variable i_pos : 1 ... 10 1... 10 1 ... 10 1...10. Moreover the betas are different in the same i_pos (1 1 1 1). I have the impression that STATA creates subsamples (4 clusters), and makes 10 iterations in each subsample (cluster) created.

Code:

*parallel setclusters 4
            parallel initialize 4, force s("C:\Stata 16 MP\StataMP-64.exe")
            
            capture program drop savereg
            program define savereg 
                qui {
                    sum rp_partner_rank, d
                    
                    forvalues i = `r(min)' (1) 10 {
                            
                                xi: reg rp_avg_pc_epi_gap_abs lag_fdi_in_all $contrlsorder i.year i.pccountry i.rpcountry if rp_partner_rank <= `i', vce(cl id)
                                *eststo
                                lincom _b[lag_fdi_in_all], l(95)
                                replace beta = r(estimate) in `i'
                                replace se = r(se) in `i'
                                replace i_pos = `i' in `i'
                                replace lb_95 = r(lb) in `i'
                                replace ub_95 = r(ub) in `i'
                                
                                lincom _b[lag_fdi_in_all], l(90)
                                replace lb_90 = r(lb) in `i'
                                replace ub_90 = r(ub) in `i'
                                
                                lincom _b[lag_fdi_in_all], l(99)
                                replace lb_99 = r(lb) in `i'
                                replace ub_99 = r(ub) in `i'
                            
                    
                    }
                } 
            
            end
            
            parallel, prog(savereg): savereg

Click image for larger version

Name: Graph4.png
Views: 1
Size: 56.2 KB
ID: 1642411

Comment

Matthew Alexander

Join Date: Feb 2021

Posts: 58
#18

26 Dec 2021, 11:37

Hi Lucien,
Happy to help.
Personally, I do not use the setclusters syntax. This does not mean it is not appropriate in your case, though I note it is not included in help file here https://github.com/gvegayon/parallel
If you are having issues then I advise you to drop

Code:

setclusters 4

and simply use the following

Code:

parallel initialize 4, force s("C:\Stata 16 MP\StataMP-64.exe")

If this new problem still persists, try explaining it again in as clear a manner as possible - I didn't fully understand the explanation above.
All the best,
Matt

Last edited by Matthew Alexander; 26 Dec 2021, 11:42.
Comment
Matthew Alexander

Join Date: Feb 2021

Posts: 58
#19

26 Dec 2021, 11:47

I'll add too that you are correct about how parallel functions.
Stata splits the dataset by the number of cores specified - and then simultaneously iterates over each of these subsets.
Comment
Lucien AHOUANGBE

Join Date: Mar 2019

Posts: 15
#20

26 Dec 2021, 12:09

Hi Mattiew
I tried both codes (cluster and initialized)... both give the same result.
I'll give up and try to find alternatives with R software. It should do the job without worries but if my director is more familiar with STATA.
By the way, I exchanged Mr. George Vega Yon author of the parallel command, and I sent him this discussion page to follow us and intervene. But I think he is currently disconnected. Let's wait and see, what he thinks.
Anyway, I want to thank you very much for all your help. It is really wonderful to have agents like you. I wish you success in your projects. And I hope that Stata will improve its parallelization orders in the future. Thanks a lot
Lucien
Comment
Matthew Alexander

Join Date: Feb 2021

Posts: 58
#21

26 Dec 2021, 12:13

Hi Lucien,
It is somewhat disappointing to hear that you have decided to give up on Stata for this particular analysis.
I am certain that you are but one step away from achieving what you want.
If you explain one last time this new problem related to the range of the variable - i_pos - then I do believe we can "crack" the case.
If not, then I'm happy to have helped regardless.
All the very best,
Matt
Comment
Matthew Alexander

Join Date: Feb 2021

Posts: 58
#22

26 Dec 2021, 12:28

In fact, I've gone over the above. And the issue, I think, is that you are not telling parallel to do what you want.
The issue is that you are running the regression within parallel.
Parallel splits the datasets into 4 subsets in the first instance, thus when you include an estimation command within parallel you are estimating 4 separate regressions at each value between 1 and 10 within each subset of your dataset.
I assume that you want to run the regression on the full population - to do this you will need to insert the regression command outside of the parallel program.
For your needs, one option is to write a separate regression loop and save the estimates.
That is

Code:

forvalues i = 1/10 { xi: reg rp_avg_pc_epi_gap_abs lag_fdi_in_all $contrlsorder i.year i.pccountry i.rpcountry if rp_partner_rank <= `i', vce(cl id) estimates save est`i' }

Then you should set your parallel program to run after the initial regression loop. Of course, I suspect that the whole reason for you wanting to use parallel is to speed up computation time of the regressions rather than post-estimation via lincom etc. Which is no longer case with the above code.
A broader point is that it is somewhat unusual to estimate regressions by restricting the sample using if i_pos < i. Assuming i_pos is some kind of group identifier, estimates from such regressions are not directly comparable between groups - rather estimates represent the effect within that particular group. A more standard method would be to estimate the full model just once and then get marginal predictions/effects by values of i_pos using the excellent and in-built - margins - command and its - by - option. The margins documentation is very clear and helpful, and you can easily save the results and thus produce the sort of plots seen above.
Hope this helps,
Matt

Last edited by Matthew Alexander; 26 Dec 2021, 13:15.
Comment
Lucien AHOUANGBE

Join Date: Mar 2019

Posts: 15
#23

26 Dec 2021, 15:20

I didn't understand this:
poste #21: If you explain one last time this new problem related to the range of the variable - i_pos -
But by the way the variable i-pos gives the position of the partners for a country. If it is 1, it is the first partner; 2, the second; and n the n-th partner.
I want to see the effect of the lag_fdi_in_all variable on the dependent variable, depending on the partners considered.

Yes, I understand what you mean. But by the way, what I'm looking to do is much more complicated. There are still several other regressions that I have to run, between 14 and 20 (very long to run), and for each of the interest variables entering the regression (e.g. lag_fdi_in_all in the code), I have to compare the interest coefficients according to the position of the investor partners (rp_partner_rank), commercial partners (rp_trade_rank), and other partners....
But for a single regression, the position of the partners can go from 1 to 200. And I was looking to save especially time via parallelization with 12 cores if it is possible.
But alas, it's going to be complicated to get this time-saving.
I never used the command for marginal predictions/effects. I will read the documentation on it and try to use it to see what happens.
Thanks again for the suggestion.
Many thanks Matthew. I'll keep you posted on what happens next.
Thanks again.

Last edited by Lucien AHOUANGBE; 26 Dec 2021, 15:34.
Comment
Matthew Alexander

Join Date: Feb 2021

Posts: 58
#24

26 Dec 2021, 15:41

Well, Lucien, that certainly does sound like a rather complicated endeavour. Make sure that what you are doing is really what you want to do before investing what I imagine will be a lot of time.
I assume that when you say you want to estimate " the effect depending on the partners considered", what you mean is that you want to estimate the effect for each partner. If so, then it seems to me that a more efficient, natural and accessible method would be to estimate a single regression model, and then calculate Average Marginal Effects for each predictor using - margins - at each value of i_pos, i_com and so forth using option -by- or -over.
Specifically, the syntax would be something like

Code:

reg rp_avg_pc_epi_gap_abs lag_fdi_in_all $contrlsorder i.year i.pccountry i.rpcountry, vce(cl id) margins, dydx(lag_fdi_in_all) over(rp_partner_rank) atmeans

Food for thought at the very least.
I wish you all the best, feel free to update me on the project.
Matt

Last edited by Matthew Alexander; 26 Dec 2021, 16:07.
Comment
Lucien AHOUANGBE

Join Date: Mar 2019

Posts: 15
#25

26 Dec 2021, 17:27

But by the way the variable i-pos gives the position of the partners for a country. If it is 1, it is the first partner; 2, the second; and n the n-th partner.

No, sorry, I made a mistake above.
In fact, we want to take into account the effect of the size of partner countries, but not individually as with the code you sent me, but by taking into account their importance in trade with the host country.
the variable i-pos gives the position of the partners for a country. If it is 1, it is the first and largest partner; 2, the second-largest partner; and n the n-th partner.

So for example, i-pos <=15 means that we want to take into account the first 15 largest partners, i-pos <=100 means that we want to take into account the first 100 largest partners.

I was trying to understand your code. And I think it only take the effect with each partner individually, according to its position with the variable i_pos.
I was trying to see if we can take into account the first 15 at the same time. And I think it must exist on stata, I'll continue the research too.
But I don't know if you can understand me, if not I will try to explain better.

I don't know if it will work but I have an idea with your code to combine the subpop option with foreach, like this

Code:

reg rp_avg_pc_epi_gap_abs lag_fdi_in_all $contrlsorder i.year i.pccountry i.rpcountry, vce(cl id) foreach i = 1/10 { margins, dydx(lag_fdi_in_all) subpop(rp_partner_rank <= `i') atmeans // store the result }

But I have started a job again for the moment and the memory is saturated. I wait for the memory to free up a bit and then I run a code like this.
Thanks a lot Matthiew

Last edited by Lucien AHOUANGBE; 26 Dec 2021, 17:56.
Comment

Matthew Alexander

Join Date: Feb 2021
Posts: 58

#26

26 Dec 2021, 18:23

Hi Lucien,
I now understand what you want to do, and why you want to use parallel. This will certainly be a computationally costly analysis.
That said, I believe we were very close to achieving what you want with the earlier parallel program. The only problem was that parallel produced an estimate for a given i_pos value within each cluster/subset, hence 40 estimates when there should have been just 10.
For the final time, I now believe I know what the problem was. In short, parallel splits the dataset into four subsets. Therefore, when you replace the ith observation in each of the subsets with the ith coefficient value, what is effectively happening is that for 3 of four subsets the ith observation does not correspond with the ith coefficient. For example, say you used 5 cores/clusters and your loop went from 1/10, the fifth parallel subset will estimate the coefficient where i_pos is a) <= 9 and b) <=10. But the 9th and 10th observation in the fifth subset will not correspond to i_pos = 9 and I_pos = 10.
The solution, I believe, is to use a postfile rather than directly replacing the values in the subset. Try the following

Code:


parallel initialize 4, force s("C:\Stata 16 MP\StataMP-64.exe")
            
            capture program drop savereg
            program define savereg

            tempname tempf
            postfile `tempf' beta se i_pos lb_95 ub_95 lb_90 ub_90 lb_99 ub_99 using mypostfile, replace

          
                qui {
                    sum rp_partner_rank, d
                    
                    forvalues i = `r(min)' (1) 10 {
                            
                                xi: reg rp_avg_pc_epi_gap_abs lag_fdi_in_all $contrlsorder i.year i.pccountry i.rpcountry if rp_partner_rank <= `i', vce(cl id)
                                *eststo
                                lincom _b[lag_fdi_in_all], l(95)
                                local beta = r(estimate)
                                local se = r(se)
                                local i_pos = `i'
                                local lb_95 = r(lb)
                                local ub_95 = r(ub)
                                
                                lincom _b[lag_fdi_in_all], l(90)
                                local lb_90 = r(lb)
                                local ub_90 = r(ub)
                                
                                lincom _b[lag_fdi_in_all], l(99)
                                local lb_99 = r(lb)
                                local ub_99 = r(ub)

                                post `tempf' (`beta') (`se') (`i_pos') (`lb_95') (`ub_95') (`lb_90') (`ub_90') (`lb_99') (`ub_99')
                            
                    
                    }
                }
            
      postclose `tempf'
            
 end
  
 parallel, prog(savereg): savereg

Then, open the file mypostfile, which is saved within your working directory. Inspect the results. I think they will be what you are looking for.
Matt

Last edited by Matthew Alexander; 26 Dec 2021, 18:27.

Comment

Matthew Alexander

Join Date: Feb 2021

Posts: 58
#27

26 Dec 2021, 19:00

If this is still not giving you what you want, my very final suggestion is to run the same program above, but with the parallel -by- option like so

Code:

parallel, prog(savereg) by(rp_partner_rank): savereg

or by(i_pos), I'm not sure which you want.
Best,
Matt
Comment

Lucien AHOUANGBE

Join Date: Mar 2019
Posts: 15

#28

27 Dec 2021, 17:40

Hello Matthew,
Sorry for the delay. I had logged out since my last message.
Thank you for the code. It made me learn a lot about new things.
But unfortunately it still doesn't work.
I have improved the code to know the number of observations entering each sub sample.
I normally have 105254 observations but with the parellelization I end up with about 26314 in each sub sample and still 40 beta instead of 10.

Code:

. sum

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        beta |         40     .001144    .0018185  -.0019402   .0059244
          se |         40    .0014964    .0007972   .0005478   .0038555
       i_pos |         40         5.5    2.908872          1         10
       lb_95 |         40   -.0018164    .0012459  -.0044984   .0004264
       ub_95 |         40    .0041044    .0031552   .0004887   .0124325
-------------+---------------------------------------------------------
       lb_90 |         40    -.001335    .0012267  -.0040737   .0011571
       ub_90 |         40     .003623    .0029247   .0001881   .0112049
       lb_99 |         40   -.0027679    .0014181  -.0059822  -.0004713
       ub_99 |         40    .0050559    .0036146   .0010598    .014845
           N |         40     26313.5    .5063697      26313      26314

For this:

parallel, prog(savereg) by(rp_partner_rank): savereg

There is an error when I run it.

Code:

.  parallel, prog(savereg) by(rp_partner_rank): savereg
Data not sorted
r(5);

Thank you very much for the support.
For the moment I'm going to leave this part, I'm going to try to move forward on the results I have now and try to find a solution. I will get back to you as soon as possible.
Lucien

Last edited by Lucien AHOUANGBE; 27 Dec 2021, 17:44.

Comment

Matthew Alexander

Join Date: Feb 2021

Posts: 58
#29

28 Dec 2021, 06:06

Hi Lucien,
I looked it in the issue further. In short, parallel splits the dataset into n subsets, and so I do not think it is possible run regressions within parallel on the full sample as you want to do.
Perhaps there is some kind of workaround, though I do not know it. The best solution may be simply to allow your pc time to compute the many regressions you want to run. I know you have a large number of observations, but it is still possible. I also noted that you cluster your standard errors by cli and id. If your data is panel data (that is, repeated observations of clusters) then I strongly advise you to use xtreg to account for unobserved differences between clusters (unobserved heterogeneity). Option - re - will estimate a random effects model. And - fe - will estimates fixed effects.
Let me know if you would like to more,
Matt
Comment
George Vega

Join Date: May 2014

Posts: 13
#30

30 Dec 2021, 11:26

Hi all,

Lucien, as Matthew points out, the default behavior of parallel is to split the dataset into how many threads you are using. Nonetheless, parallel is perfectly capable of doing what you are trying to do. What you need to do is:
Write a program that loads the dataset with use ...

Within that program, make use of the parallel macros $PLL_CHILDREN (global, number of threads) and $pll_instance (global, takes values 1 through $PLL_CHILDREN) to control what chunk of the loop is done per thread.

At the end of the program, you can do something like save "iteration_$pll_instance`'.dat", replace to make each thread save a different version of the file.

Then you can use parallel append, do(yourprogram) e("iteration_%g.dat, 1/$PLL_CHILDREN").

More examples on how to use the parallel macros here and here.

HIH

George

Last edited by George Vega; 30 Dec 2021, 11:29. Reason: didn't like how [CODE] looked like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment