Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding Noise to Simulations

    One thing my instructors have told me is that I should use Monte-Carlo simulations to validate new statistical estimators.


    One such synthetic dataset in a paper I found (for free on arXiv, see page 23 if you'd like) generated the data according to the following
    Code:
    clear
    
    set obs 100
    
    egen id = seq(), f(1) t(100) // 100 units
    
    expand 250 // 250 time periods
    
    bys id : g time = _n // time 1-250
    
    set seed 1000
    
    // The synthetic data **!!
    
    qbys id: g y = ln(time)+4*sin(time/_pi)+4*cos(time/_pi)+runiform() 
    
    //**!! above, runiform() should be an additive noise term epsilon_t
    
    xtset id time, g
    The issue is that after the second term [4*cos(time/_pi)], there's an error term epsilon indexed to time. I currently used runiform(), but methinks this isn't the same thing as adding in noise. Specifically, the paper says that epsilon_t
    is an i.i.d Gaussian noise with a mean of zero and variances of 1, 4, 9, 16, and 25
    Well........ how would I generate Gaussian noise? Or any other kind of noise? How would I specify its variance?

    Presumably there's a simple solution, I've just never made a simulation before. Any ideas how I'd do this?

  • #2
    Code:
    help rnormal()

    Comment


    • #3
      I’m not sure if I can give you specific advice to your problem. It does appear that the description of epsilon doesn’t match the code. It seems like you want to add

      Code:
      rnormal(0, s) // is parametrized by s as the standard deviation
      In general, Monte-Carlo simulations are based on using a specified data generating model to create fake data, and the specific model is usually informed by the implied models used for analysis. If we consider a simple linear regression, then this model specifically calls for additive effects with an explicit error term (epsilon) following a normal distribution with mean zero and user-selected variance. In real terms you add -rnormal()- to your linear predictor term. On the other extreme is when there is no explicit error term, such as with logistic regression, where the “noise” is probabilistic realizations of the outcome. In hierarchical models, there will be distributional (noise) assumptions at each level.

      Edit: crossed with #2 as I was typing this out. Edited typos.
      Last edited by Leonardo Guizzetti; 17 Apr 2022, 09:35.

      Comment


      • #4
        My estimator is similar to the one in the paper I linked, so the assumptions are pretty much the same. Thank you both so much!

        Comment


        • #5
          And just for your info, runiform() would have generated random numbers from a (0, 1) uniform distribution. That has a mean of 0.5, and a support of 0 to 1 inclusive. Its SD is less than the SD of a standard normal distribution. So it was actually adding noise, but it was adding less noise than you probably wanted to model, and it was also adding in bias. If you were simulating binary data, you could do something like

          Code:
          gen y = runiform() > cutoff
          Where cutoff is whatever the cutoff probability is.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Thank you. I imagine I could do all kinds of data generating processes with different error terms..... such as, an AR(1) error term?

            And, presuming that an estimator predicts the treated unit's outcomes well, even in cases of reasonable levels of noise or different kinds of error terms, that's a better argument for the validity of the estimator, right? Weiwen Ng
            Last edited by Jared Greathouse; 17 Apr 2022, 17:17.

            Comment

            Working...
            X