Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    In my study 120 observations represent the number of months and the random numbers adding up to 196 represent the number of mergers.
    This leads me to realize we are going about this from the wrong direction. What we have are 196 mergers, with each merger equally likely to be concluded in each of 120 months.

    Starting from that, here is code that, I believe, accomplishes the desired goal.
    Code:
    clear
    set seed 42
    local reps 10     // 1000
    local mergers 14  // 196
    local months 12   // 120
    
    set obs `reps'
    generate rep = _n
    
    expand `mergers'
    generate month = runiformint(1,`months')
    generate N = 1
    
    collapse (sum) N, by(rep month)
    fillin rep month
    replace N = 0 if _fillin
    drop _fillin
    label variable N .
    
    sort rep month
    list if rep<=2, noobs sepby(rep)
    
    tab month N
    
    // better to leave the data in a long layout
    // but if necessary, here is a wide layout
    
    rename N r
    reshape wide r, i(rep) j(month)
    list if rep<=2, noobs
    Code:
    . clear
    
    . set seed 42
    
    . local reps 10     // 1000
    
    . local mergers 14  // 196 
    
    . local months 12   // 120
    
    . 
    . set obs `reps'
    number of observations (_N) was 0, now 10
    
    . generate rep = _n
    
    . 
    . expand `mergers'
    (130 observations created)
    
    . generate month = runiformint(1,`months')
    
    . generate N = 1
    
    . 
    . collapse (sum) N, by(rep month)
    
    . fillin rep month
    
    . replace N = 0 if _fillin
    (45 real changes made)
    
    . drop _fillin
    
    . label variable N .
    
    . 
    . sort rep month
    
    . list if rep<=2, noobs sepby(rep)
    
      +-----------------+
      | rep   month   N |
      |-----------------|
      |   1       1   1 |
      |   1       2   4 |
      |   1       3   2 |
      |   1       4   0 |
      |   1       5   0 |
      |   1       6   0 |
      |   1       7   1 |
      |   1       8   2 |
      |   1       9   1 |
      |   1      10   0 |
      |   1      11   2 |
      |   1      12   1 |
      |-----------------|
      |   2       1   2 |
      |   2       2   0 |
      |   2       3   1 |
      |   2       4   1 |
      |   2       5   1 |
      |   2       6   1 |
      |   2       7   1 |
      |   2       8   2 |
      |   2       9   1 |
      |   2      10   1 |
      |   2      11   0 |
      |   2      12   3 |
      +-----------------+
    
    . 
    . tab month N
    
               |                                 .
         month |         0          1          2          3          4          5 |     Total
    -----------+------------------------------------------------------------------+----------
             1 |         6          2          2          0          0          0 |        10 
             2 |         4          1          2          2          1          0 |        10 
             3 |         4          5          1          0          0          0 |        10 
             4 |         5          3          2          0          0          0 |        10 
             5 |         4          3          1          2          0          0 |        10 
             6 |         4          1          2          3          0          0 |        10 
             7 |         2          4          2          1          1          0 |        10 
             8 |         2          3          3          0          2          0 |        10 
             9 |         4          5          0          0          0          1 |        10 
            10 |         4          4          0          1          0          1 |        10 
            11 |         3          3          3          1          0          0 |        10 
            12 |         3          3          1          3          0          0 |        10 
    -----------+------------------------------------------------------------------+----------
         Total |        45         37         19         13          4          2 |       120 
    
    
    . 
    . // better to leave the data in a long layout
    . // but if necessary, here is a wide layout
    . 
    . rename N r
    
    . reshape wide r, i(rep) j(month)
    (note: j = 1 2 3 4 5 6 7 8 9 10 11 12)
    
    Data                               long   ->   wide
    -----------------------------------------------------------------------------
    Number of obs.                      120   ->      10
    Number of variables                   3   ->      13
    j variable (12 values)            month   ->   (dropped)
    xij variables:
                                          r   ->   r1 r2 ... r12
    -----------------------------------------------------------------------------
    
    . list if rep<=2, noobs
    
      +--------------------------------------------------------------------+
      | rep   r1   r2   r3   r4   r5   r6   r7   r8   r9   r10   r11   r12 |
      |--------------------------------------------------------------------|
      |   1    1    4    2    0    0    0    1    2    1     0     2     1 |
      |   2    2    0    1    1    1    1    1    2    1     1     0     3 |
      +--------------------------------------------------------------------+
    
    .
    Last edited by William Lisowski; 08 Jun 2018, 06:42. Reason: Initial version neglected to set the seed to make the results reproducible.

    Comment


    • #17
      Viewed from this perspective, the hole thing seems loosely related to the XY Problem, where the initial question would have started with the description quoted by William above. William now suggests to randomly assign one of the 120 months to the 196 mergers. I believe his code does this; so should the following:

      Code:
      // setting parameters
      local mergers     196 // 14
      local months     120 // 12
      
      // all months
      clear
      set obs `months'
      
      generate month         = _n
      generate N_mergers     = 0
      
      tempfile Months
      save `Months'
      
      // draw month of merger randomly
      clear
      
      set obs `mergers'
      generate month = runiformint(1, `months')
      
      // convert to number of mergers
      contract month , freq(N_mergers)
      
      // put together
      merge m:1 month using `Months' , keep(using matched) nogenerate
      
      // look at result
      sort month
      list
      Best
      Daniel

      Comment


      • #18
        Hi,

        From a different perspective, I solved the problem by assigning total number of mergers (196 in the example) one by one to randomly selected months. In this way each merger has an equal chance to be in one of the 120 months.(1/120). Here is the code I wrote:


        set obs 120
        gen month=_n
        gen merger=0
        forval i=1/196 {
        scalar random= runiformint(1,120)
        replace merger=merger+1 if month==random
        }
        egen tot= total(merger)
        di tot

        In this way I can obtain one sequence that contains 120 observations that sum up to 196.

        Thank you very much for your help!

        Comment


        • #19
          Despite still believing in the logic of my code in #10, it seems to me that, at least for the process of my code, runiformint might not as random as expected, irrespective of whatever seed to be selected. Or, of cource, ...my code is not constructed right.

          My below code is expected to list down all available combinations, ignoring any sort order as noted by Daniel (i.e r`i' &amp;lt;= r`i+1' with any `i'). This code could be tested with any number of fixed sum (S) and number of components (m). The explanation of my code should be provided if it might attract any interesting. Actually, it is going along with the same logic introduced by Daniel at #13. It seems working properly with some of my test, but with S =196, m=120 like the example of Merve, the memory is not enough, since the total number of combinations would be around 100.000.000 or more

          Code:
          clear *
          set more 1
          local S = 25
          local m = 16
          
          set obs 1
          gen ex=.
          gen r0=0
          gen r`m'=`S'
          local sofar r0
          
          forval i = 1/`=`m'-1' {
          replace ex = 1 + floor(r`m'/(`m'+1-`i')) - r`=`i'-1'
          expand ex
          bys `sofar': gen r`i' = r`=`i'-1' +_n-1
          local sofar = "`sofar'" + " r`i'"
          replace r`m' = r`m'- r`i'
          }
          drop ex r0
          order r`m', last
          The right solution for this interesting issue might exist somewhere, waiting for your contribution.

          Comment


          • #20
            Originally posted by Romalpa Akzo View Post
            Despite still believing in the logic of my code in #10, it seems to me that, at least for the process of my code, runiformint might not as random as expected, irrespective of whatever seed to be selected. Or, of cource, ...my code is not constructed right.
            I still feel there are multiple ideas of what exactly is supposed to be drawn at random and what the probability for each element to be drawn should be. The code in #10 (actually first suggested in #6; the modification in #10 makes it potentially even harder to study its properties) produces all possible combinations/partitions but not with equal probability. I believe the reason for this is that runiformint() is, in fact, exactly as random as expected. runiformint() draws from the set 0 to r1 with equal probability.

            After thinking about this for quite some time and writing down many examples, I believe the core of the problem is related to the following (I wish I could derive this more mathematically sound):

            Assume that there is a total of N possible combinations/partitions (sort order irrelevant). The probability for each of these combinations/partitions to be drawn with equal probability is, thus, 1/N. Why does the code in #6 not produce these equal probabilities?

            Start, for example, with the probability for the combination/partition 196, 0, 0, …, 0. The probability for this combination/partition (sort order relevant) is (1/197)^119; We first set r1 = 196. We then draw 119 times from the uniform distribution [0; 196], where each number has equal probability of 1/197 to be drawn. Note that this is only true as long the draws yield 0; for any other number, the set of numbers to be drawn from decreases from draw to draw and so the probability for each number in the set increases. Anyway, the only way to obtain the above combination/partition (sort order relevant) is with all 119 draws yielding 0. Further, there are 120 possible permutations of this combination/partition. The probability to obtain 0, 196, 0, …, 0 is (1/197); the first draw must be 196; the probability for 0, 0, 196, 0, …, 0 is (1/197)^2; the first draw must be 0, the second must be 196; and so on. Summing these probabilities yields 0,005. Thus, the probability of drawing the combination/partition 0, 0, …, 196 (sort order irrelevant) is about 0,5 per cent or 1/200.

            From the above example, it follows that the code in #6 will produce the the combination/partition 196, 0, 0, …, 0 (sort order irrelevant) more often than expected under a uniform distribution of combinations/partitions when N > 200. As Romalpa suggests, N is probably larger than 100,000,000.

            Though this problem is interesting, I lack the time and motivation to dive into it any further. Mike has posted the link to Wikipedia in #7 that provides relevant search terms. There are lots of posts and papers discussing variations of the problem here, and algorithms are provided to solve it.

            Best
            Daniel
            Last edited by daniel klein; 09 Jun 2018, 06:00.

            Comment

            Working...
            X