Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Randomly assigning numbers to observations- with a fixed sum

    Hi,

    I want to assign random numbers to some amount of observations where the numbers have a constant sum. I want to set 120 observations and assign integers in range of (0,total) where the integers sum up to total. I started as shown below but I cannot fix the sum of r s to 196.

    program define simulation, rclass
    drop_all
    set obs 120
    gen r= runiformint(0,196)


    end

    I appreciate your help in advance.

  • #2
    Think about it. You want to randomly pick integers between 0 and 196 and have them sum to 196. If we draw all ones, the sum is already 120. We are constrained to very few numbers and definitely no two above 196/2 = 98. If we are to stay true to randomization, approximately half should be above and below the mid way point.

    Code:
    . set obs 120
    number of observations (_N) was 0, now 120
    
    . set seed 1234
    
    . gen rand= runiformint(0, 196)
    
    . count if rand < 98
      62
    
    . count if rand > 98
      58
    Last edited by Andrew Musau; 07 Jun 2018, 08:49.

    Comment


    • #3
      This is an interesting problem, but I agree with Andrew that the term "random" here is an odd description. I think there is a another way to describe what Merve wants, which is to create a list of 120 integers such that they sum to 196, and then to assign elements of this list randomly randomly without replacement to the _N = 120 observations. The relevant term here, as I understand it, is "integer partition." (About which see https://en.wikipedia.org/wiki/Partit...ating_function)
      So, the problem is to generate one of the partitions of 196 that is of length 120, allowing 0 as a part of a given partition, and assign those elements at random to 120 observations.

      I gather from a bit of searching that algorithms exist to create integer partitions. My suggestion would be to translate one of these to Mata, and assign from a vector in Mata to Stata. If Merve or someone else can track down (e.g.) such an algorithm, such a translation should be reasonable.

      Comment


      • #4
        Andrew Musau I don't understand your point, I may express myself wrong. I want to assign 120 integers randomly summing up to 196. For example, one possible randomization can be 119 0s and 1 196.

        1. a
        2. b
        3. c
        ..
        120. x

        those a,b,c, up to x should sum up to 196.

        If I only write gen rand= runiformint(0, 196) then the sum of the values of variable rand may exceed 196.

        thank you very much for your interest. 🙏🏻

        Comment


        • #5
          Guess the point is that once you draw (at random) the integer 196 for the first observation, then the remaining 119 observations' integers can no longer be random draws; they are fixed at 0. The question is what "random" here means. That question can probably only be answered in light of the research question at hand and what kind of randomness is required here.

          Edit: I am almost one hundred percent sure that we have discussed some very similar (if not the same) problem on the list; cannot find the reference, though. The solution involved to keep track of the (running) sum and for each observation draw from the set of possible integers that when added to the current sum would not exceed the limit (196).

          Edit2: Ok, only slightly related; see here. Again, the point is, that there are only so many combinations of 120 (integer) values that sum up to 196.

          Best
          Daniel
          Last edited by daniel klein; 07 Jun 2018, 09:55.

          Comment


          • #6
            Below code should give out 100 combinations for your need.
            Code:
            clear
            set obs 100
            gen r1=196
            forval i=2/120{
            gen r`i'=runiformint(0,r1)
            replace r1=r1-r`i'
            }
            xpose, clear

            Comment


            • #7
              Nice simple solution from Romalpa. I got diverted by the more common posing of the problem as "generate all the partitions that ...."

              Comment


              • #8
                Romalpa's code in #6 is clever (as usual) and I think it illustrates the problem. Run her code and then run

                Code:
                forvalues j = 1/100 {
                    count if v`j'
                }
                Notice that most of the time the number of observations with values larger than 0 is below 10 (and those 10 observations are always the first 10 or so). Whether this qualifies as "random draws" I cannot tell. If it satisfies Merve's need, that is fine. But it does probably not represent random draws (with equal probability) from the universe of "all the partitions that ....", which Mike is referring to in #7. Probably one would at least randomize the sort order of observations so that the last 100 or so observations are not always assigned 0.


                Best
                Daniel
                Last edited by daniel klein; 07 Jun 2018, 11:14. Reason: Reason: Added links; added last sentence; spelling ... done

                Comment


                • #9
                  I agree with Daniel. If I can predict that any observations beyond the first 20 are 0s, then there is no randomness. If the constraint is that the sum of 120 random integers has to be 196, then these can only be the integers 0, 1, 2 and 3. I would use a bruteforce approach to get the seed

                  Code:
                  set maxvar 7000
                  set obs 120
                  forvalues i= 1/2000{
                  gen rand`i'= runiformint(0,3)
                  gen seed`i'="`c(seed)'"
                  egen total`i'= total(rand`i')
                  }
                  sum total*
                  There will be a few totals equaling 196. So get the seed and you can use this (in my case, one occurence was i=1924 and there were several )

                  Code:
                  local y= seed1924 in 1
                  clear
                  set obs 120
                  set seed `y'
                  gen mysum= runiformint(0,3)
                  egen total= total(mysum)
                  sum total
                  Code:
                  . sum total
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                         total |        120         196           0        196        196
                  
                  .
                  So here I can claim that I have 120 random integers that sum to 196.

                  Comment


                  • #10
                    Thanks Daniel and Andrew for your comments. However, I believe that the logic behind my suggested code should illustrate exactly, and step by step, the random draws in this story. With a number set large enough, the runiformint will get its proper function to provide out the equal probability.

                    Code:
                    clear
                    set obs 1000000 // or larger
                    gen r1=196
                    forval i=2/120{
                    gen r`i'=runiformint(0,r1)
                    replace r1=r1-r`i'
                    }
                    egen count0=anycount(r*), v(0)
                    sum count0
                        Variable |        Obs        Mean    Std. Dev.       Min        Max
                    -------------+---------------------------------------------------------
                          count0 |  1,000,000    114.1442    2.052455        101        119
                    If you prefer a smaller number but more "beautiful" combination:

                    Code:
                    clear
                    set obs 10000
                    gen r1=196
                    forval i=2/120{
                    gen r`i'=runiformint(0,min(r1,11)) //11 or any number around tend to give out less 0
                    replace r1=r1-r`i'
                    }
                    egen count0=anycount(r*), v(0)
                    sum count0
                    Last edited by Romalpa Akzo; 07 Jun 2018, 12:10.

                    Comment


                    • #11
                      Here is a reproducible example of the problem.
                      Code:
                      clear
                      set obs 1000000 // or larger
                      set seed 42
                      gen r1=196
                      forval i=2/120{
                      gen r`i'=runiformint(0,r1)
                      replace r1=r1-r`i'
                      }
                      egen count0=anycount(r40-r120), v(0)
                      sum count0
                      Code:
                      . sum count0
                      
                          Variable |        Obs        Mean    Std. Dev.       Min        Max
                      -------------+---------------------------------------------------------
                            count0 |  1,000,000          81           0         81         81
                      Thus, in 1,000,000 simulations, the 81 integers r40-r120 were always assigned a value of zero. As Daniel suggests in post #8, this does not agree with the usual definitions of randomness.

                      Comment


                      • #12
                        What about just shuffling the values of the observations of the resulting variables v1, ;;;, v2, v_nrep that exist after the -xpose-? That would seem to give each observation an equal chance to get a "0." I can't think of any good way to shuffle at the stage of r1, r2, ..., r120.

                        Comment


                        • #13
                          Just shuffling (or sorting, as I suggested) will not do. Let me break this down to a smaller scale so we can follow more easily. Suppose we have 3 observations and want the sum to be fixed at 10. Ignoring any sort order, so that 0 + 0 + 10 is the same as 10 + 0 + 0, there are 14 partitions to achieve this:

                          Code:
                          0 + 0 + 10
                          0 + 1 + 9
                          ...
                          1 + 1 + 9
                          ...
                          3 + 3 + 4
                          When we say that we want to randomly assign these partitions, we arguably mean that each should have equal probability to be drawn. Thus, we would expect a each of these partitions to appear with the same frequency.

                          I will write a short Mata function to sort rows of a matrix (code is not good but will suffice for our purposes). This will help with ignoring sort order.

                          Code:
                          version 14.1
                          
                          clear mata
                          set matastrict on
                          
                          mata :
                          
                          void rowsort()
                          {
                              real matrix x, sx
                              real scalar i
                          
                              sx = x = st_data(., .)
                              for (i = 1; i <= rows(x); ++i) {
                                  sx[i, ] = sort(x[i, ]', 1)'
                              }
                              st_store(., ., sx)
                          }
                          
                          end
                          Now lets run the code in question. Instead of 1,000,000 runs, I think 1000 shall suffice for demonstration.

                          Code:
                          clear
                          set obs 1000 // or larger
                          set seed 42
                          gen r1=10
                          forval i=2/3{
                          gen r`i'=runiformint(0,r1)
                          replace r1=r1-r`i'
                          }
                          
                          // look at first five obs
                          list in 1/5
                          
                          // sort rows
                          mata : rowsort()
                          
                          // confirm rows are sorted; sort order of partitions is now irrelevant, since 10, 0, 0 is now sorted to be 0, 0, 10.
                          list in 1/5
                          
                          // get the frequencies
                          contract r*
                          
                          list
                          Here is the result

                          Code:
                          . list
                          
                               +----------------------+
                               | r1   r2   r3   _freq |
                               |----------------------|
                            1. |  0    0   10     101 |
                            2. |  0    1    9     127 |
                            3. |  0    2    8     106 |
                            4. |  0    3    7      83 |
                            5. |  0    4    6      81 |
                               |----------------------|
                            6. |  0    5    5      40 |
                            7. |  1    1    8      45 |
                            8. |  1    2    7      93 |
                            9. |  1    3    6      69 |
                           10. |  1    4    5      63 |
                               |----------------------|
                           11. |  2    2    6      41 |
                           12. |  2    3    5      79 |
                           13. |  2    4    4      34 |
                           14. |  3    3    4      38 |
                               +----------------------+
                          Clearly, this distribution is not uniform. Increasing the number of runs does not help; the results for 1,000,000

                          Code:
                               +-----------------------+
                               | r1   r2   r3    _freq |
                               |-----------------------|
                            1. |  0    0   10   107392 |
                            2. |  0    1    9   125596 |
                            3. |  0    2    8    97411 |
                            4. |  0    3    7    84622 |
                            5. |  0    4    6    78885 |
                               |-----------------------|
                            6. |  0    5    5    38806 |
                            7. |  1    1    8    48216 |
                            8. |  1    2    7    83998 |
                            9. |  1    3    6    77501 |
                           10. |  1    4    5    74168 |
                               |-----------------------|
                           11. |  2    2    6    38526 |
                           12. |  2    3    5    73180 |
                           13. |  2    4    4    35793 |
                           14. |  3    3    4    35906 |
                               +-----------------------+
                          The bottom line seems that the suggested code does not produce random draws from the universe of possible partitions. But then again, Merve still has not commented on whether this is actually the randomness he has in mind.

                          Best
                          Daniel
                          Last edited by daniel klein; 07 Jun 2018, 14:17. Reason: spelling, once again

                          Comment


                          • #14
                            Originally posted by Romalpa Akzo View Post
                            Below code should give out 100 combinations for your need.
                            Code:
                            clear
                            set obs 100
                            gen r1=196
                            forval i=2/120{
                            gen r`i'=runiformint(0,r1)
                            replace r1=r1-r`i'
                            }
                            xpose, clear
                            This is a great solution but as daniel klein mentioned above the number of observations with values greater than 0 is below observation 10. So there is something missing in randomness I guess. I need to run this randomization 1000 times. But when I set observations 1000 again I get 0s after r15.

                            In my study 120 observations represent the number of months and the random numbers adding up to 196 represent the number of mergers. So in this way I will not have any merger after the 15th month.

                            I want all r`i' s have equal chance to get any number in between (0,196). In this solution also r1s will always be 0, this can break the randomness I want to get.

                            I appreciate all your time and effort.

                            Comment


                            • #15
                              Originally posted by Merve Meric View Post
                              I want all r`i' s have equal chance to get any number in between (0,196)
                              I am not sure this is really what you want.

                              In this definition, the random process is defined only in terms of the observations (r`i') not the possible partitions. One way to get this, is it to start with (any) one possible partition (perhaps from Romalpa's code) and randomly sort the observations. Each observation will then have the same chance of getting any of the fixed initial values. Put differently, you ask here for a random permutation of any given partition without defining the random process that is to create the partition(s).

                              If you, instead, mean that you want each observation (r`i') to have the same chance of getting any numbers between 0 and 196 assigned, then these values will not necessarily add to 196; in fact they are very unlikely to. This is, I think, what Andrew Musau criticizes.

                              Best
                              Daniel
                              Last edited by daniel klein; 08 Jun 2018, 05:39. Reason: links keep disappearing

                              Comment

                              Working...
                              X