Randomly assigning numbers to observations- with a fixed sum

William Lisowski

Join Date: Dec 2014
Posts: 10150

#16

08 Jun 2018, 06:36

In my study 120 observations represent the number of months and the random numbers adding up to 196 represent the number of mergers.

This leads me to realize we are going about this from the wrong direction. What we have are 196 mergers, with each merger equally likely to be concluded in each of 120 months.

Starting from that, here is code that, I believe, accomplishes the desired goal.

Code:

clear
set seed 42
local reps 10     // 1000
local mergers 14  // 196
local months 12   // 120

set obs `reps'
generate rep = _n

expand `mergers'
generate month = runiformint(1,`months')
generate N = 1

collapse (sum) N, by(rep month)
fillin rep month
replace N = 0 if _fillin
drop _fillin
label variable N .

sort rep month
list if rep<=2, noobs sepby(rep)

tab month N

// better to leave the data in a long layout
// but if necessary, here is a wide layout

rename N r
reshape wide r, i(rep) j(month)
list if rep<=2, noobs

Code:

. clear

. set seed 42

. local reps 10     // 1000

. local mergers 14  // 196 

. local months 12   // 120

. 
. set obs `reps'
number of observations (_N) was 0, now 10

. generate rep = _n

. 
. expand `mergers'
(130 observations created)

. generate month = runiformint(1,`months')

. generate N = 1

. 
. collapse (sum) N, by(rep month)

. fillin rep month

. replace N = 0 if _fillin
(45 real changes made)

. drop _fillin

. label variable N .

. 
. sort rep month

. list if rep<=2, noobs sepby(rep)

  +-----------------+
  | rep   month   N |
  |-----------------|
  |   1       1   1 |
  |   1       2   4 |
  |   1       3   2 |
  |   1       4   0 |
  |   1       5   0 |
  |   1       6   0 |
  |   1       7   1 |
  |   1       8   2 |
  |   1       9   1 |
  |   1      10   0 |
  |   1      11   2 |
  |   1      12   1 |
  |-----------------|
  |   2       1   2 |
  |   2       2   0 |
  |   2       3   1 |
  |   2       4   1 |
  |   2       5   1 |
  |   2       6   1 |
  |   2       7   1 |
  |   2       8   2 |
  |   2       9   1 |
  |   2      10   1 |
  |   2      11   0 |
  |   2      12   3 |
  +-----------------+

. 
. tab month N

           |                                 .
     month |         0          1          2          3          4          5 |     Total
-----------+------------------------------------------------------------------+----------
         1 |         6          2          2          0          0          0 |        10 
         2 |         4          1          2          2          1          0 |        10 
         3 |         4          5          1          0          0          0 |        10 
         4 |         5          3          2          0          0          0 |        10 
         5 |         4          3          1          2          0          0 |        10 
         6 |         4          1          2          3          0          0 |        10 
         7 |         2          4          2          1          1          0 |        10 
         8 |         2          3          3          0          2          0 |        10 
         9 |         4          5          0          0          0          1 |        10 
        10 |         4          4          0          1          0          1 |        10 
        11 |         3          3          3          1          0          0 |        10 
        12 |         3          3          1          3          0          0 |        10 
-----------+------------------------------------------------------------------+----------
     Total |        45         37         19         13          4          2 |       120 


. 
. // better to leave the data in a long layout
. // but if necessary, here is a wide layout
. 
. rename N r

. reshape wide r, i(rep) j(month)
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                      120   ->      10
Number of variables                   3   ->      13
j variable (12 values)            month   ->   (dropped)
xij variables:
                                      r   ->   r1 r2 ... r12
-----------------------------------------------------------------------------

. list if rep<=2, noobs

  +--------------------------------------------------------------------+
  | rep   r1   r2   r3   r4   r5   r6   r7   r8   r9   r10   r11   r12 |
  |--------------------------------------------------------------------|
  |   1    1    4    2    0    0    0    1    2    1     0     2     1 |
  |   2    2    0    1    1    1    1    1    2    1     1     0     3 |
  +--------------------------------------------------------------------+

.

Last edited by William Lisowski; 08 Jun 2018, 06:42. Reason: Initial version neglected to set the seed to make the results reproducible.

Comment

daniel klein

Join Date: Mar 2014
Posts: 3820

#17

08 Jun 2018, 08:32

Viewed from this perspective, the hole thing seems loosely related to the XY Problem, where the initial question would have started with the description quoted by William above. William now suggests to randomly assign one of the 120 months to the 196 mergers. I believe his code does this; so should the following:

Code:

// setting parameters
local mergers     196 // 14
local months     120 // 12

// all months
clear
set obs `months'

generate month         = _n
generate N_mergers     = 0

tempfile Months
save `Months'

// draw month of merger randomly
clear

set obs `mergers'
generate month = runiformint(1, `months')

// convert to number of mergers
contract month , freq(N_mergers)

// put together
merge m:1 month using `Months' , keep(using matched) nogenerate

// look at result
sort month
list

Best
Daniel

Comment

Merve Meric

Join Date: Jun 2018

Posts: 4
#18

08 Jun 2018, 09:24

Hi,

From a different perspective, I solved the problem by assigning total number of mergers (196 in the example) one by one to randomly selected months. In this way each merger has an equal chance to be in one of the 120 months.(1/120). Here is the code I wrote:

set obs 120
gen month=_n
gen merger=0
forval i=1/196 {
scalar random= runiformint(1,120)
replace merger=merger+1 if month==random
}
egen tot= total(merger)
di tot

In this way I can obtain one sequence that contains 120 observations that sum up to 196.

Thank you very much for your help!
1 like
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#19

08 Jun 2018, 18:28

Despite still believing in the logic of my code in #10, it seems to me that, at least for the process of my code, runiformint might not as random as expected, irrespective of whatever seed to be selected. Or, of cource, ...my code is not constructed right.

My below code is expected to list down all available combinations, ignoring any sort order as noted by Daniel (i.e r`i' &lt;= r`i+1' with any `i'). This code could be tested with any number of fixed sum (S) and number of components (m). The explanation of my code should be provided if it might attract any interesting. Actually, it is going along with the same logic introduced by Daniel at #13. It seems working properly with some of my test, but with S =196, m=120 like the example of Merve, the memory is not enough, since the total number of combinations would be around 100.000.000 or more

Code:

clear * set more 1 local S = 25 local m = 16 set obs 1 gen ex=. gen r0=0 gen r`m'=`S' local sofar r0 forval i = 1/`=`m'-1' { replace ex = 1 + floor(r`m'/(`m'+1-`i')) - r`=`i'-1' expand ex bys `sofar': gen r`i' = r`=`i'-1' +_n-1 local sofar = "`sofar'" + " r`i'" replace r`m' = r`m'- r`i' } drop ex r0 order r`m', last

The right solution for this interesting issue might exist somewhere, waiting for your contribution.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3820
#20

09 Jun 2018, 05:47

Originally posted by Romalpa Akzo View Post

Despite still believing in the logic of my code in #10, it seems to me that, at least for the process of my code, runiformint might not as random as expected, irrespective of whatever seed to be selected. Or, of cource, ...my code is not constructed right.

I still feel there are multiple ideas of what exactly is supposed to be drawn at random and what the probability for each element to be drawn should be. The code in #10 (actually first suggested in #6; the modification in #10 makes it potentially even harder to study its properties) produces all possible combinations/partitions but not with equal probability. I believe the reason for this is that runiformint() is, in fact, exactly as random as expected. runiformint() draws from the set 0 to r1 with equal probability.

After thinking about this for quite some time and writing down many examples, I believe the core of the problem is related to the following (I wish I could derive this more mathematically sound):

Assume that there is a total of N possible combinations/partitions (sort order irrelevant). The probability for each of these combinations/partitions to be drawn with equal probability is, thus, 1/N. Why does the code in #6 not produce these equal probabilities?

Start, for example, with the probability for the combination/partition 196, 0, 0, …, 0. The probability for this combination/partition (sort order relevant) is (1/197)^119; We first set r1 = 196. We then draw 119 times from the uniform distribution [0; 196], where each number has equal probability of 1/197 to be drawn. Note that this is only true as long the draws yield 0; for any other number, the set of numbers to be drawn from decreases from draw to draw and so the probability for each number in the set increases. Anyway, the only way to obtain the above combination/partition (sort order relevant) is with all 119 draws yielding 0. Further, there are 120 possible permutations of this combination/partition. The probability to obtain 0, 196, 0, …, 0 is (1/197); the first draw must be 196; the probability for 0, 0, 196, 0, …, 0 is (1/197)^2; the first draw must be 0, the second must be 196; and so on. Summing these probabilities yields 0,005. Thus, the probability of drawing the combination/partition 0, 0, …, 196 (sort order irrelevant) is about 0,5 per cent or 1/200.

From the above example, it follows that the code in #6 will produce the the combination/partition 196, 0, 0, …, 0 (sort order irrelevant) more often than expected under a uniform distribution of combinations/partitions when N > 200. As Romalpa suggests, N is probably larger than 100,000,000.

Though this problem is interesting, I lack the time and motivation to dive into it any further. Mike has posted the link to Wikipedia in #7 that provides relevant search terms. There are lots of posts and papers discussing variations of the problem here, and algorithms are provided to solve it.

Best
Daniel

Last edited by daniel klein; 09 Jun 2018, 06:00.
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment