I have a small program to generate multinomial random variables in Stata, given an existing Stata variable name and a probability vector. It's a wrapper for a Mata function that uses Mata's -rdiscrete()-. The problem I'm having is that -rdiscrete()- occasionally will not execute with certain probability vectors.
-rdiscrete()- understandably requires a probability vector that sums *very* closely to 1.0, a problem my function adapts to by checking for this problem and always making a tiny adjustment to the first element of the probability vector. After many years of using this function without incident, I encountered a problem while trying to generate a uniform multinomial variable with 7 categories.
The surprising thing is that -rdiscrete()- worked with a 7-element probability vector that was a *little* bit off from summing to 1.0, but *failed* when I gave it an input vector for 7 categories that was *closer* to summing to 1.0! This stumps me, though my guess goes to some kind of precision issue. Do any of you have an idea of what the problem might be?
Code for the Mata function, including some debugging echos and some relevant examples appear below.
-rdiscrete()- understandably requires a probability vector that sums *very* closely to 1.0, a problem my function adapts to by checking for this problem and always making a tiny adjustment to the first element of the probability vector. After many years of using this function without incident, I encountered a problem while trying to generate a uniform multinomial variable with 7 categories.
The surprising thing is that -rdiscrete()- worked with a 7-element probability vector that was a *little* bit off from summing to 1.0, but *failed* when I gave it an input vector for 7 categories that was *closer* to summing to 1.0! This stumps me, though my guess goes to some kind of precision issue. Do any of you have an idea of what the problem might be?
Code for the Mata function, including some debugging echos and some relevant examples appear below.
Code:
cap mata mata drop multi() mata: void multi(string varname, string okname, real scalar N, real colvector prob) // Generate multinomial random variable in Stata. { "Input prob vector:" prob // If sum of prob. vector differs from 1.0 by too much, it's an error. diff = sum(prob) - 1.0 if (abs(diff) > 1e-6) { st_local(okname, "0") exit } // User-given prob. vectors often differ a little from 1.0 due to precision issues. // Avoid this by *always* trivially adjusting the first element in the vector. "diff before adjustment: " + strofreal(diff, "%20.5g") prob[1] = prob[1] - diff diff = sum(prob) - 1.0 "diff after adjustment: " + strofreal(diff, "%20.5g") x = rdiscrete(N, 1, prob) st_store(., varname, x) st_local(okname, "1") } end // // Examples, good and bad. clear local N = 100 set obs `N' // gen byte y3 = . // Works fine. mata: multi("y3", "OKflag", `N', /// (.33333333\.33333333\.33333333) ) tab y3 // // Works fine: prob. vector with values reasonably close to 1/7 gen byte y7 = . mata: multi("y7", "OKflag", `N', /// (.14285714\.14285714\.14285714\.14285714\ /// .14285714\.14285714\.14285714) ) tab y7 // // Values *closer* to 1/7 fail. gen byte y7closer = . mata: multi("y7closer", "OKflag", `N', /// (.1428571428571\.1428571428571\.1428571428571\.1428571428571\ /// .1428571428571\.1428571428571\.1428571428571) )
Comment