Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why can't I reproduce results in my fixed effects logit using runiform() when setting the seed?

    Hi all,

    I have panel data of individuals, their binary unemployment and their weight across 3 waves. I want to determine if the relationship between their binary employment and weight could reflect a secular increase in weight, thus I create a binary random treatment created from noise, reasoning that if I see an effect of this variable on weight, it's likely that the original unemployment effect on weight is not trustworthy.


    I use the runiform() function in Stata to produce a random number for each person for each year provided they have employment data, as I want to compare these results to the results for people I previously studied an employment change in. If this random number is below 0.5 I give them a binary variable that is ==1 and if it is above 0.5 I give them a binary variable that is equal to 0, like a fake employed/unemployed binary variable.

    I use the runiform() function as I understand it to be recursive, and thus replicable.

    Code:
    . capture drop draw_fathers
    
    . set seed 9000
    
    . display c(seed)
    Xfed3371cc43f462544a474abacbdd93d00044448
    
    . display runiform()
    .42625766
     
    . gen draw_fathers = cond(runiform() < .5, 1, 0) if X_ADDFAunempusualsitpes_y!=.
    (7,241 missing values generated)
    
    . tab draw_fathers
    
    draw_father |
              s |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |     13,071       49.96       49.96
              1 |     13,090       50.04      100.00
    ------------+-----------------------------------
          Total |     26,161      100.00

    It's important to me that my results are reproducible, which is why I set the seed.

    The first time I ran a regression with this random variable I got the following result:

    Code:
    
    . xtreg ba_nogawho  i.draw_fathers i.C_region_y i.year i.C_Simplemotherage_y i.C_Simplemothereduca_y i.C_mothermar_y   if
    >  atleast2weightmeasures == 1, cluster (id) fe 
    
    Fixed-effects (within) regression               Number of obs      =     24421
    Group variable: id                              Number of groups   =      9159
    
    R-sq:  within  = 0.0494                         Obs per group: min =         1
           between = 0.0000                                        avg =       2.7
           overall = 0.0133                                        max =         3
    
                                                    F(12,9158)         =     90.54
    corr(u_i, Xb)  = -0.0254                        Prob > F           =    0.0000
    
                                                           (Std. Err. adjusted for 9,159 clusters in id)
    ----------------------------------------------------------------------------------------------------
                                       |               Robust
                            ba_nogawho |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------------------------+----------------------------------------------------------------
                        1.draw_fathers |   .0104891   .0120381     0.87   0.384    -.0131081    .0340864
                          1.C_region_y |   .0146547   .0318843     0.46   0.646    -.0478457     .077155
                                       |
                                  year |
                                    1  |   .1867891   .0131795    14.17   0.000     .1609543    .2126239
                                    2  |  -.1578258   .0163961    -9.63   0.000    -.1899658   -.1256857
                                       |
                   C_Simplemotherage_y |
                                30-39  |   .0210079   .0291105     0.72   0.471    -.0360551     .078071
                           40 or more  |   .0035692     .04051     0.09   0.930    -.0758395    .0829779
                                       |
                 C_Simplemothereduca_y |
    Leaving Certificate to Non Degree  |   .1249528   .0631912     1.98   0.048      .001084    .2488215
            Primary Degree or greater  |   .1234926   .0711105     1.74   0.082    -.0158998    .2628851
                                       |
                         C_mothermar_y |
                                    2  |   .0916308   .1511258     0.61   0.544    -.2046096    .3878711
                                    3  |   .0401262   .1206813     0.33   0.740     -.196436    .2766883
                                    4  |  -.0247547   .0388708    -0.64   0.524    -.1009501    .0514408
                                    5  |   .3145843   .3789889     0.83   0.407    -.4283185    1.057487
                                       |
                                 _cons |   .5884448   .0659928     8.92   0.000     .4590842    .7178054
    -----------------------------------+----------------------------------------------------------------
                               sigma_u |  .88704999
                               sigma_e |   .7514361
                                   rho |  .58220466   (fraction of variance due to u_i)
    ----------------------------------------------------------------------------------------------------
    
    .

    When I run Stata, and repeat the above my results are the same, however if I close Stata, (i.e. clear and start again) I get completely different results the next time I run the regression as below, even though I use the same .do file to generate everything, including my random number.

    Can anyone help me to understand why this is, and how to get the same results every time?


    The next time I did this re-running the .do file after closing Stata and opening it again:


    Code:
    
    capture drop draw_fathers
    
    set seed 9000
    
    display c(seed)
    Xfed3371cc43f462544a474abacbdd93d00044448
    
    display runiform()
    42625766
    
    gen draw_fathers = cond(runiform() < .5, 1, 0) if X_ADDFAunempusualsitpes_y!=.
    (7,241 missing values generated)
    
    tab draw_fathers
    
    draw_father |
              s |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |     13,071       49.96       49.96
              1 |     13,090       50.04      100.00
    ------------+-----------------------------------
          Total |     26,161      100.00
    
    
    
    
    
    . xtreg ba_nogawho  i.draw_fathers i.C_region_y i.year i.C_Simplemotherage_y i.C_Simplemothereduca_y i.C_mothermar_y   if
    >  atleast2weightmeasures == 1, cluster (id) fe 
    
    Fixed-effects (within) regression               Number of obs      =     24421
    Group variable: id                              Number of groups   =      9159
    
    R-sq:  within  = 0.0495                         Obs per group: min =         1
           between = 0.0000                                        avg =       2.7
           overall = 0.0134                                        max =         3
    
                                                    F(12,9158)         =     90.88
    corr(u_i, Xb)  = -0.0249                        Prob > F           =    0.0000
    
                                                           (Std. Err. adjusted for 9,159 clusters in id)
    ----------------------------------------------------------------------------------------------------
                                       |               Robust
                            ba_nogawho |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------------------------+----------------------------------------------------------------
                        1.draw_fathers |  -.0165708   .0121349    -1.37   0.172     -.040358    .0072164
                          1.C_region_y |   .0142196   .0318937     0.45   0.656    -.0482993    .0767384
                                       |
                                  year |
                                    1  |   .1869799   .0131823    14.18   0.000     .1611396    .2128202
                                    2  |  -.1576787   .0163978    -9.62   0.000    -.1898221   -.1255354
                                       |
                   C_Simplemotherage_y |
                                30-39  |   .0210425   .0291073     0.72   0.470    -.0360143    .0780993
                           40 or more  |   .0035197   .0405133     0.09   0.931    -.0758953    .0829347
                                       |
                 C_Simplemothereduca_y |
    Leaving Certificate to Non Degree  |   .1251361   .0632171     1.98   0.048     .0012165    .2490558
            Primary Degree or greater  |   .1247482   .0711284     1.75   0.079    -.0146793    .2641756
                                       |
                         C_mothermar_y |
                                    2  |   .0911564   .1514408     0.60   0.547    -.2057013    .3880141
                                    3  |   .0367664   .1207045     0.30   0.761    -.1998413    .2733741
                                    4  |  -.0247201    .038866    -0.64   0.525    -.1009061    .0514659
                                    5  |   .3015891   .3819644     0.79   0.430    -.4471464    1.050325
                                       |
                                 _cons |   .6014585   .0662502     9.08   0.000     .4715932    .7313237
    -----------------------------------+----------------------------------------------------------------
                               sigma_u |  .88697445
                               sigma_e |  .75140868
                                   rho |  .58218098   (fraction of variance due to u_i)
    ----------------------------------------------------------------------------------------------------
    
    .

    Thanks for any help,

    John











  • #2
    the problem does not seem to be runiform(), as you showed it creates exactly the same number of successes and failures. Apparently, these successes are assigned to different observaitions. This is actually quite likely. If the sort order differs, then runiform() will still assign the same 0 or 1 to the first observation, but who that first observation is can differ from run to run. This can happen when somewhere in your .do file your are sorting on a variable with ties, e.g. your person id. If you sort on such a variable Stata will sort the observations with the same value randomly (what else could it do?), and this random order is governed by a different seed, see help sortseed. This is a design choice by StataCorp: it warns you that you are sorting on a variable with ties and that your results depend on it, which is usually something you want to be warned about. But the "warning" is rather indirect, and tends to confuse people who have never heard this.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      I didn't see there were several threads going for this, I added a response here:

      https://www.statalist.org/forums/for...96#post1545896

      Comment

      Working...
      X