Clarification on choice of random seeds and how often to set seeds

Jack Peters

Join Date: Mar 2020

Posts: 8
#1

Clarification on choice of random seeds and how often to set seeds

10 Aug 2021, 09:45

Stata's help documentation for set seed (https://www.stata.com/manuals/rsetseed.pdf) emphasizes that (A) seeds should be set only once per "problem" and that (B) the number for the seed should be as random as possible. It seems that this advice was devised with simulations in mind. I have several questions about this advice.

1. What is the definition of a "problem"? For example, suppose that I have N specifications (e.g. continuous vs binary variable; various choices of error distributions; etc.) and I want to run a simulation of 10,000 replications for each of them. Does each specification constitute a separate problem, such that it's alright to (A) set seeds for each specification and (B) use the same seed for each specification? This is equivalent to setting the same seed N times.

2. Do recommendations (A) and (B) only pertain to simulations? For example, suppose that I am working with real data and I want to use lasso for covariate selection. I have N dependent variables, some of which are possibly correlated, and want to run the lasso for each one. Can I just use the rseed() option with the same seed for each lasso? This is equivalent to setting the same seed N times.

Thanks in advance!

Last edited by Jack Peters; 10 Aug 2021, 09:53.
Tags: lasso, random number generator, random samples, seed, simulation
Clyde Schechter

Join Date: Apr 2014

Posts: 29961
#2

10 Aug 2021, 10:09

The answer for 1 is that it depends. Sometimes one does multiple simulations and you want them to be "parallel universes" so that you can do matched-pair, rather than independent, comparisons of simulated results. The variance reduction that comes from using matched pairs can be appreciable and allow detection of small differences with manageable numbers of runs. For example, if I want to contrast the rate of failures of some system under different assumptions about some particular system parameter's distributions, all else equal, I can do that more efficiently if I use the same random number stream in each of the simulations. In that case, you would reset the seed to the same starting value before each one.

But if for what you are doing there is no desire to create parallelism, you may as well just set the seed once before the first simulation and then let things just keep on going. For example, if I want to contrast the rate of failures of some substantially different systems that have different failure-generating processes with different underlying parameters, etc., there is nothing to be gained by using the same random number stream to simulate each of them--and, in fact, if by chance I get some fluky result from one random number stream, it is actually better if I do not propagate that into the others.

With regard to 2, see the answer to 1. Except that in this case, I would think that you would specifically want to avoid any parallelism across these analyses. This does not sound to me like the kind of application where you would be making paired comparisons of matched results, and I think the safety of not reproducing particular random over-fittings across models would be desirable. So I think the answer here is to set it once and not reset thereafter.
3 likes
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#3

11 Aug 2021, 03:27

Apart from the useful advice that Clyde gives, a lot simpler advice would be: If you do not know what you are doing, set the seed once, and let the system work itself out. The seed does not really need to be random, any number would do.

1. Clyde is right to say that you can eliminate a lot of unnecessary randomness in particular occasions by setting the seed to the same number. But even if you do not set the seed to the same number, you will not get wrong results without eliminating the simulation randomness. It would just be that you would have some randomness that can be avoided.

2. If you reset the seed in occasions where you need to have a random sequence (situation opposite to 1. above) you will do humongous damage by generating deterministic sequence by resetting the seed to the same initial point. In this case the results will be just wrong if you reset to the same initial point.

In short, if you do not know what you re doing, set the seed once, and let the system take care of itself.
4 likes
Comment
Jack Peters

Join Date: Mar 2020

Posts: 8
#4

11 Aug 2021, 13:55

Clyde Schechter Joro Kolev Thanks for your advice - it was really helpful!
Comment

Announcement

Clarification on choice of random seeds and how often to set seeds

Comment

Comment

Comment