Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rationale behind set seed/seed

    I have been coming across the "set seed" command while generating random numbers and while reporting bootstrap standard errors (example: set seed 1073741823) and "seed" while doing non-parametric regression (example: npregress kernel citations fines, reps(200) seed(12))
    Can someone please explain-
    1. What does "seed" and "setting seed" exactly mean here? What is it's significance?
    2. Why do we have to set it?
    3. What happens when we don't set seed and don't use "seed(12)" in the npregress command?

    Thanks in advance!
    Last edited by Lars Pete; 02 Dec 2020, 23:21.

  • #2
    The commands you mention require randomness, that is, for bootstrapping, you randomly resample from the actual data with replacement. I will not open up the question what exactly randomness is and I guess you have a basic understanding. However, computers cannot create randomness since they are perfectly deterministic machines. Thus some people invented pseudo random number generators (PRNGs) so we get something like randomness out these machines. What we know after decades of research is that PRNGs are fine for our analyses. A PRNG needs a "starting point" from where to begin its cycles and generate random numbers. The same seed will produce exactly the same numbers (when the algorithm of the PRNG is identical). Basically it boils down to reproducibility, which in science is important. So setting the seed before you start a computation will guarantee that another user on another machine (using the same Stata version) will receive exactly the same results as you did). If you dont set the seed, I assume that Stata just uses the current date of the system as a seed (or something similar) so results change each time you do the computation (hopefully only slightly, but still). For a much more detail information see: https://www.stata.com/manuals13/rsetseed.pdf
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      The linked PDF in post #2 is for a Stata 13 manual. Stata includes full PDF documentation as part of its installation, and this documentation is accessible from Stata's Help menu and is almost always linked to from the abbreviated output of the help command.

      Within your copy of Stata, running
      Code:
      help set seed
      should get you an extensive discussion and, at the top, a clickable link that will open the full PDF documentation for the command that installed with your vesion of Stata.

      Comment


      • #4
        Originally posted by Lars Pete View Post
        I have been coming across the "set seed" command while generating random numbers and while reporting bootstrap standard errors (example: set seed 1073741823) and "seed" while doing non-parametric regression (example: npregress kernel citations fines, reps(200) seed(12))
        Can someone please explain-
        1. What does "seed" and "setting seed" exactly mean here? What is it's significance?
        2. Why do we have to set it?
        3. What happens when we don't set seed and don't use "seed(12)" in the npregress command?

        Thanks in advance!
        Thank you for the answer. That makes sense. I see that setting the seed is extremely important but only as far as replication of your results is concerned. Setting the same seed would generate the same set of random numbers. And while reporting bootstrap std errors, I believe setting seed will generate the same distribution of the resampled data? Also, np regress will give different results. But how much will the results differ?
        Last edited by Lars Pete; 06 Dec 2020, 17:14.

        Comment


        • #5
          Originally posted by William Lisowski View Post
          The linked PDF in post #2 is for a Stata 13 manual. Stata includes full PDF documentation as part of its installation, and this documentation is accessible from Stata's Help menu and is almost always linked to from the abbreviated output of the help command.

          Within your copy of Stata, running
          Code:
          help set seed
          should get you an extensive discussion and, at the top, a clickable link that will open the full PDF documentation for the command that installed with your vesion of Stata.
          Thank you for the link. I had already gone through the STATA manual for this but I didn't find it helpful.

          Comment


          • #6
            But how much will the results differ?
            So, think about it. When you use a different seed (either actively by setting it, or passively by leaving it to wherever Stata happens to be in its sequence of (pseudo-) random numbers, you are, in effect, doing a different random sample. So the variation in results should be precisely the sampling variance you would expect. In other words, if you were to do that many times, the standard deviations of the distributions of the coefficients would be approximately the standard errors you get from a single run of the same command. That is, in fact, the definition of standard error.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              So, think about it. When you use a different seed (either actively by setting it, or passively by leaving it to wherever Stata happens to be in its sequence of (pseudo-) random numbers, you are, in effect, doing a different random sample. So the variation in results should be precisely the sampling variance you would expect. In other words, if you were to do that many times, the standard deviations of the distributions of the coefficients would be approximately the standard errors you get from a single run of the same command. That is, in fact, the definition of standard error.
              Yes that makes perfect sense. Thank you.

              Comment

              Working...
              X