Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • loop for random draws with specific mean and sd

    Dear All

    I have a dataset containing different WTPs coeff and sd (I provide an example of the data here below). I need to calculate two random draws with the specific mean and sd (mean: WTP_B_NMF , WTP_B_MF and sd:sd_B_NMF, sd_B_MF ) from each single version. Then calculate the difference of the 10000 (number of draws) new values and retrieve its mean and sd.

    Is there anyone that can help on this?
    thanks
    Federica

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(country product version) float(WTP_B_NMF sd_B_NMF WTP_B_MF sd_B_MF)
    1 3 1  .289713 .043104 -.084623 .045278
    1 3 2  .486897 .041785 -.036676 .043752
    1 2 1  .065856 .031855  -.10064 .033648
    1 2 2 -.025039 .029118 -.285143 .034542
    1 1 2  .024029 .029038 -.323775 .042935
    end
    label values country country
    label def country 1 "Germany", modify
    label values product product2
    label def product2 1 "Cookies", modify
    label def product2 2 "Spaghetti", modify
    label def product2 3 "Yogurt", modify
    label values version version2
    label def version2 1 "HU", modify
    label def version2 2 "LI", modify

  • #2
    You don't say what kind of random number distribution you want: there are infinitely many distributions with a given mean and standard deviation. The code below uses the normal (Gaussian) distribution. But if WTP stands for willingness to pay, then normal is probably inappropriate, as it can be produce negative values. Then again our WTP_B_MF variable does have negative means, so maybe that's fine. So you may or may not have to tweak the code to use some other distribution that is more suitable.

    Code:
    set seed 1234 // OR ANY OTHER RANDOM NUMBER GENERATOR SEED
    
    foreach x in MF NMF {
        gen draw_`x' = rnormal(WTP_B_`x', sd_B_`x')
    }
    
    gen diff = draw_MF - draw_NMF
    summ diff
    I also am not sure where you get the 10,000 from in your problem. Each observation in the original data set yields one draw for MF and one for NMF, so two per observation. Do you mean that your real data set starts with 5000 observations?

    Comment


    • #3
      Dear Clyde

      thanks you for your answer.
      Answering to your questions:
      1) I used the inv. norm. distr.
      2) the 10000 are the number of random observations I generate (see the attached excel) and then use to create my random variables.
      3) when running the command i get the following error message: sd_B_MF not found. But I tries with the data provided in the example.
      F.

      Attached Files

      Comment


      • #4
        Thanks for your response. I have not looked at the spreadsheet you attached because, a) I don't download attachments from people I don't know, and b) even from people I do know, I am wary of Microsoft Office files because they can contain active malware.

        I can't explain the sd_B_MF not found message. The code runs without error messages in the example data you provided. So somehow your real data set is different from your example data and does not contain the sd_B_MF variable.

        Comment


        • #5
          Ok, sorry.

          I copy here below what I am trying to do in stata. It is not so cleat to me how to do it.
          I hope the example below gives a better understanding of my question.
          F



          WTP_B_NMF sd_B_NMF WTP_B_MF sd_B_MF
          MEAN SD Germany Yogurt HU 0.289713 0.043104 -0.084623 0.045278
          DE: WTP_yogurt_HU_B_NMF 0.289713 0.043104 (Euro) Yogurt LI 0.486897 0.041785 -0.036676 0.043752
          DE: WTP_yogurt_HU_B_MF -0.08462 0.045278 Spaghetti HU 0.065856 0.031855 -0.10064 0.033648
          Spaghetti LI -0.025039 0.029118 -0.285143 0.034542
          Cookies LI 0.024029 0.029038 -0.323775 0.042935
          =RAND() =RAND() =NORMINV(random#1,$B$3,$C$3 ) =NORMINV(random#2,$B$4,$C$4 ) summary statistics difference
          random #1 random #2 random coeff random coeff difference
          0.775263808 0.770272394 0.322312337 -0.0511 0.373441 0.375234 average
          0.310099247 0.192430179 0.268351992 -0.124 0.39232 0.062705 stdev
          0.275460487 0.22882021 0.264006609 -0.1183 0.382259
          0.261080328 0.347177542 0.262125648 -0.1024 0.364541
          0.731546626 0.262073828 0.316329604 -0.1135 0.429793
          up to 10000 up to 10000 up to 10000 up to 10000 up to 10000

          Comment


          • #6
            It is a bit hard to read what you have shown, but as best I can tell, the code in #2 should do what you are showing here. It may need some modification to work with your actual Stata data set, but nothing you have posted helps me figure that out. In any event, I can't work from a spreadsheet: I need to work with a Stata data set, as represented by -dataex-. Is the example you showed in #1 taken from an actual import of your spreadsheet into Stata? If so, I see no reason why it should not run. Perhaps you should post back with a new -dataex- that comes from your actual working data set (if what you show in #1 is not already that) and then also show the exact code you are using and the exact output you are getting from Stata including all error messages or warnings.

            Another thing that is not clear to me is whether you want 10,000 random draws for each observation ("row") in your example data in #2, or whether your example data set contains 10,000 rows and you want a set of draws for each row.

            Comment


            • #7
              Looking a bit longer at what you show in #5, it now appears to me that you actually want to do 10,000 draws for each observation (row) in your example data, then calculate the differences between the mf and nmf draws and end up with the mean and standard deviation of the differences. If that's the case, you can do it this way:

              Code:
              capture program drop one_sample
              program define one_sample
                  local nmf_mean = WTP_B_NMF[1]
                  local nmf_sd = sd_B_NMF[1]
                  local mf_mean = WTP_B_MF[1]
                  local mf_sd = sd_B_MF[1]
                  expand 10000
                  foreach x in mf nmf {
                      gen `x' = rnormal(``x'_mean', ``x'_sd')
                  }
                  gen delta = nmf - mf
                  summ delta
                  gen mean_diff = r(mean)
                  gen sd_diff = r(sd)
                  keep in 1
                  drop delta mf nmf
                  exit
              end
              
              set seed 1234
              isid country product version
              runby one_sample, by(country product version) status
              I am assuming here that country product and version together identify unique observations in the data set. This assumption is verified in the -isid- command and the code will stop there with an error message if it is violated.

              -runby- is written by Robert Picard and me, and it is available from SSC.

              All of that said, it seems you are going to a bit of trouble to get approximate answers from a simulation, when you can get exact ones from simple formulas:

              Code:
              gen mean_diff = WTP_B_NMF - WTP_B_MF
              gen sd_diff = sqrt(sd_B_NMF^2 + sd_B_MF^2)
              Last edited by Clyde Schechter; 22 Apr 2022, 12:30.

              Comment


              • #8
                Dear Clyde,

                THANKS a lot! The command you gave reproduce exactly the results in excel.
                I also made a cross ref to this post for those that were looking for a solution.
                Best
                Federica

                Comment

                Working...
                X