Rationale behind set seed/seed

Lars Pete

Join Date: Nov 2020

Posts: 118
#1

Rationale behind set seed/seed

02 Dec 2020, 22:32

I have been coming across the "set seed" command while generating random numbers and while reporting bootstrap standard errors (example: set seed 1073741823) and "seed" while doing non-parametric regression (example: npregress kernel citations fines, reps(200) seed(12))
Can someone please explain-
1. What does "seed" and "setting seed" exactly mean here? What is it's significance?
2. Why do we have to set it?
3. What happens when we don't set seed and don't use "seed(12)" in the npregress command?

Thanks in advance!

Last edited by Lars Pete; 02 Dec 2020, 23:21.
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 663
#2

03 Dec 2020, 00:33

The commands you mention require randomness, that is, for bootstrapping, you randomly resample from the actual data with replacement. I will not open up the question what exactly randomness is and I guess you have a basic understanding. However, computers cannot create randomness since they are perfectly deterministic machines. Thus some people invented pseudo random number generators (PRNGs) so we get something like randomness out these machines. What we know after decades of research is that PRNGs are fine for our analyses. A PRNG needs a "starting point" from where to begin its cycles and generate random numbers. The same seed will produce exactly the same numbers (when the algorithm of the PRNG is identical). Basically it boils down to reproducibility, which in science is important. So setting the seed before you start a computation will guarantee that another user on another machine (using the same Stata version) will receive exactly the same results as you did). If you dont set the seed, I assume that Stata just uses the current date of the system as a seed (or something similar) so results change each time you do the computation (hopefully only slightly, but still). For a much more detail information see: https://www.stata.com/manuals13/rsetseed.pdf

Best wishes

(Stata 16.1 MP)
3 likes
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

03 Dec 2020, 06:57

The linked PDF in post #2 is for a Stata 13 manual. Stata includes full PDF documentation as part of its installation, and this documentation is accessible from Stata's Help menu and is almost always linked to from the abbreviated output of the help command.

Within your copy of Stata, running

Code:

help set seed

should get you an extensive discussion and, at the top, a clickable link that will open the full PDF documentation for the command that installed with your vesion of Stata.
Comment
Lars Pete

Join Date: Nov 2020

Posts: 118
#4

06 Dec 2020, 17:08

Originally posted by Lars Pete View Post

I have been coming across the "set seed" command while generating random numbers and while reporting bootstrap standard errors (example: set seed 1073741823) and "seed" while doing non-parametric regression (example: npregress kernel citations fines, reps(200) seed(12))
Can someone please explain-
1. What does "seed" and "setting seed" exactly mean here? What is it's significance?
2. Why do we have to set it?
3. What happens when we don't set seed and don't use "seed(12)" in the npregress command?

Thanks in advance!

Thank you for the answer. That makes sense. I see that setting the seed is extremely important but only as far as replication of your results is concerned. Setting the same seed would generate the same set of random numbers. And while reporting bootstrap std errors, I believe setting seed will generate the same distribution of the resampled data? Also, np regress will give different results. But how much will the results differ?

Last edited by Lars Pete; 06 Dec 2020, 17:14.
Comment
Lars Pete

Join Date: Nov 2020

Posts: 118
#5

06 Dec 2020, 17:11

Originally posted by William Lisowski View Post

The linked PDF in post #2 is for a Stata 13 manual. Stata includes full PDF documentation as part of its installation, and this documentation is accessible from Stata's Help menu and is almost always linked to from the abbreviated output of the help command.

Within your copy of Stata, running

Code:

help set seed

should get you an extensive discussion and, at the top, a clickable link that will open the full PDF documentation for the command that installed with your vesion of Stata.

Thank you for the link. I had already gone through the STATA manual for this but I didn't find it helpful.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#6

06 Dec 2020, 18:05

But how much will the results differ?

So, think about it. When you use a different seed (either actively by setting it, or passively by leaving it to wherever Stata happens to be in its sequence of (pseudo-) random numbers, you are, in effect, doing a different random sample. So the variation in results should be precisely the sampling variance you would expect. In other words, if you were to do that many times, the standard deviations of the distributions of the coefficients would be approximately the standard errors you get from a single run of the same command. That is, in fact, the definition of standard error.
2 likes
Comment
Lars Pete

Join Date: Nov 2020

Posts: 118
#7

07 Dec 2020, 17:22

Originally posted by Clyde Schechter View Post

So, think about it. When you use a different seed (either actively by setting it, or passively by leaving it to wherever Stata happens to be in its sequence of (pseudo-) random numbers, you are, in effect, doing a different random sample. So the variation in results should be precisely the sampling variance you would expect. In other words, if you were to do that many times, the standard deviations of the distributions of the coefficients would be approximately the standard errors you get from a single run of the same command. That is, in fact, the definition of standard error.

Yes that makes perfect sense. Thank you.
Comment

Announcement

Rationale behind set seed/seed

Comment

Comment

Comment

Comment

Comment

Comment