loop for random draws with specific mean and sd

Federica Di Marcantonio

Join Date: Feb 2016
Posts: 110

loop for random draws with specific mean and sd

22 Apr 2022, 09:43

Dear All

I have a dataset containing different WTPs coeff and sd (I provide an example of the data here below). I need to calculate two random draws with the specific mean and sd (mean: WTP_B_NMF , WTP_B_MF and sd:sd_B_NMF, sd_B_MF ) from each single version. Then calculate the difference of the 10000 (number of draws) new values and retrieve its mean and sd.

Is there anyone that can help on this?
thanks
Federica

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long(country product version) float(WTP_B_NMF sd_B_NMF WTP_B_MF sd_B_MF)
1 3 1  .289713 .043104 -.084623 .045278
1 3 2  .486897 .041785 -.036676 .043752
1 2 1  .065856 .031855  -.10064 .033648
1 2 2 -.025039 .029118 -.285143 .034542
1 1 2  .024029 .029038 -.323775 .042935
end
label values country country
label def country 1 "Germany", modify
label values product product2
label def product2 1 "Cookies", modify
label def product2 2 "Spaghetti", modify
label def product2 3 "Yogurt", modify
label values version version2
label def version2 1 "HU", modify
label def version2 2 "LI", modify

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#2

22 Apr 2022, 09:51

You don't say what kind of random number distribution you want: there are infinitely many distributions with a given mean and standard deviation. The code below uses the normal (Gaussian) distribution. But if WTP stands for willingness to pay, then normal is probably inappropriate, as it can be produce negative values. Then again our WTP_B_MF variable does have negative means, so maybe that's fine. So you may or may not have to tweak the code to use some other distribution that is more suitable.

Code:

set seed 1234 // OR ANY OTHER RANDOM NUMBER GENERATOR SEED foreach x in MF NMF { gen draw_`x' = rnormal(WTP_B_`x', sd_B_`x') } gen diff = draw_MF - draw_NMF summ diff

I also am not sure where you get the 10,000 from in your problem. Each observation in the original data set yields one draw for MF and one for NMF, so two per observation. Do you mean that your real data set starts with 5000 observations?
Comment
Federica Di Marcantonio

Join Date: Feb 2016

Posts: 110
#3

22 Apr 2022, 10:38

Dear Clyde

thanks you for your answer.
Answering to your questions:
1) I used the inv. norm. distr.
2) the 10000 are the number of random observations I generate (see the attached excel) and then use to create my random variables.
3) when running the command i get the following error message: sd_B_MF not found. But I tries with the data provided in the example.
F.

Attached Files

Example.xlsx (1.06 MB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#4

22 Apr 2022, 10:47

Thanks for your response. I have not looked at the spreadsheet you attached because, a) I don't download attachments from people I don't know, and b) even from people I do know, I am wary of Microsoft Office files because they can contain active malware.

I can't explain the sd_B_MF not found message. The code runs without error messages in the example data you provided. So somehow your real data set is different from your example data and does not contain the sd_B_MF variable.
Comment

Federica Di Marcantonio

Join Date: Feb 2016
Posts: 110

22 Apr 2022, 11:45

Ok, sorry.

I copy here below what I am trying to do in stata. It is not so cleat to me how to do it.
I hope the example below gives a better understanding of my question.
F

								WTP_B_NMF	sd_B_NMF	WTP_B_MF	sd_B_MF
	MEAN	SD			Germany	Yogurt	HU	0.289713	0.043104	-0.084623	0.045278
DE: WTP_yogurt_HU_B_NMF	0.289713	0.043104			(Euro)	Yogurt	LI	0.486897	0.041785	-0.036676	0.043752
DE: WTP_yogurt_HU_B_MF	-0.08462	0.045278				Spaghetti	HU	0.065856	0.031855	-0.10064	0.033648
						Spaghetti	LI	-0.025039	0.029118	-0.285143	0.034542
						Cookies	LI	0.024029	0.029038	-0.323775	0.042935

=RAND()	=RAND()	=NORMINV(random#1,$B$3,$C$3 )	=NORMINV(random#2,$B$4,$C$4 )					summary statistics difference
random #1	random #2	random coeff	random coeff	difference
0.775263808	0.770272394	0.322312337	-0.0511	0.373441			0.375234	average
0.310099247	0.192430179	0.268351992	-0.124	0.39232			0.062705	stdev
0.275460487	0.22882021	0.264006609	-0.1183	0.382259
0.261080328	0.347177542	0.262125648	-0.1024	0.364541
0.731546626	0.262073828	0.316329604	-0.1135	0.429793
up to 10000	up to 10000	up to 10000	up to 10000	up to 10000

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#6

22 Apr 2022, 12:09

It is a bit hard to read what you have shown, but as best I can tell, the code in #2 should do what you are showing here. It may need some modification to work with your actual Stata data set, but nothing you have posted helps me figure that out. In any event, I can't work from a spreadsheet: I need to work with a Stata data set, as represented by -dataex-. Is the example you showed in #1 taken from an actual import of your spreadsheet into Stata? If so, I see no reason why it should not run. Perhaps you should post back with a new -dataex- that comes from your actual working data set (if what you show in #1 is not already that) and then also show the exact code you are using and the exact output you are getting from Stata including all error messages or warnings.

Another thing that is not clear to me is whether you want 10,000 random draws for each observation ("row") in your example data in #2, or whether your example data set contains 10,000 rows and you want a set of draws for each row.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29958
#7

22 Apr 2022, 12:27

Looking a bit longer at what you show in #5, it now appears to me that you actually want to do 10,000 draws for each observation (row) in your example data, then calculate the differences between the mf and nmf draws and end up with the mean and standard deviation of the differences. If that's the case, you can do it this way:

Code:

capture program drop one_sample program define one_sample local nmf_mean = WTP_B_NMF[1] local nmf_sd = sd_B_NMF[1] local mf_mean = WTP_B_MF[1] local mf_sd = sd_B_MF[1] expand 10000 foreach x in mf nmf { gen `x' = rnormal(``x'_mean', ``x'_sd') } gen delta = nmf - mf summ delta gen mean_diff = r(mean) gen sd_diff = r(sd) keep in 1 drop delta mf nmf exit end set seed 1234 isid country product version runby one_sample, by(country product version) status

I am assuming here that country product and version together identify unique observations in the data set. This assumption is verified in the -isid- command and the code will stop there with an error message if it is violated.

-runby- is written by Robert Picard and me, and it is available from SSC.

All of that said, it seems you are going to a bit of trouble to get approximate answers from a simulation, when you can get exact ones from simple formulas:

Code:

gen mean_diff = WTP_B_NMF - WTP_B_MF gen sd_diff = sqrt(sd_B_NMF^2 + sd_B_MF^2)

Last edited by Clyde Schechter; 22 Apr 2022, 12:30.
Comment
Federica Di Marcantonio

Join Date: Feb 2016

Posts: 110
#8

25 Apr 2022, 06:43

Dear Clyde,

THANKS a lot! The command you gave reproduce exactly the results in excel.
I also made a cross ref to this post for those that were looking for a solution.
Best
Federica
Comment

Announcement

loop for random draws with specific mean and sd

Comment

Comment

Comment

Comment

Comment

Comment

Comment