Folks,
I'm new to Mata programming. I'm trying to create a tool to randomly generate a number from a discrete distribution. In this observation, each respondent may belong to one of 3 predicted classes, with posterior probabilities shown in the variables c1-c3. I'm trying to generate a random draw for the class that the person belongs to.
Here's my Mata code:
I'm aware that the code doesn't generalize, or that it doesn't even output the random draws to Stata, but I want to work on that later. This code produces a conformability error (3200) when I try to run it. However, when I enter mata and try to generate the random draw for an observation whose predicted probabilities are not missing, the code works.
Note: some observations do have missing values for predicted probability (e.g. the first one in this sample). I'm trying to set those observations' random draw to missing. I suspect this is where the code is breaking. Also note, I'm aware that the vector of probabilities supplied to -rdiscrete- must sum to 1, and I've already ensured that by normalizing the probabilities (see the 2nd line of my function).
If you are interested: the context is trying to develop some validation tools for latent class models. The 3 probabilities are posterior probabilities of class membership from a 3-class model. I would like to randomly generate one random class assignment for repeated use as a start value (option startvalues(randomid guessid)) to verify that I've hit a global maximum likelihood. You can feed in the 3 posterior probabilities as start values, but you will simply replicate your solution many times; I want to do the equivalent of jittering the start values slightly.
I'm new to Mata programming. I'm trying to create a tool to randomly generate a number from a discrete distribution. In this observation, each respondent may belong to one of 3 predicted classes, with posterior probabilities shown in the variables c1-c3. I'm trying to generate a random draw for the class that the person belongs to.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(id c1 c2 c3) 1 . . . 2 .08422642 .9157736 8.121268e-24 3 .5383863 0 .4616137 4 .9936078 0 .006392221 5 1 6.864455e-36 7.544733e-12 6 .07823695 0 .9217631 7 .9886059 0 .011394108 8 .7088358 0 .2911642 9 .9999642 0 .000035737274 10 .02026204 0 .9797379 11 .0742506 0 .9257494 12 .43651125 0 .5634887 13 1 2.952504e-08 4.8909324e-16 14 .06493116 0 .9350688 15 .02226799 .977732 2.2346245e-27 16 .9999998 0 2.4839485e-07 17 .021456486 0 .9785435 18 .9999994 0 5.940184e-07 19 .9999972 0 2.830447e-06 20 .08522644 0 .9147736 21 .0318002 0 .9681998 22 .3983888 0 .6016112 23 1 9.11082e-37 5.163497e-10 24 .9969217 0 .003078261 25 .06136134 0 .9386387 26 .06014966 0 .9398503 27 1 2.4505809e-35 2.0005193e-12 28 .27831626 .7216837 1.99915e-22 29 .06772947 0 .9322705 30 .910338 0 .089662 31 .10535698 0 .894643 32 1 0 4.133972e-10 33 .9999999 0 1.0021545e-07 34 .9999825 0 .000017450597 35 .9999714 0 .000028616907 36 .9999644 0 .000035663 37 .08730433 .9126956 1.004739e-23 38 .9999963 0 3.703624e-06 39 .08522644 0 .9147736 40 .0318002 0 .9681998 41 .21588437 0 .7841156 42 .02749338 0 .9725066 43 .9355406 0 .06445945 44 .11154781 .8884522 3.181763e-23 45 1 0 8.764027e-09 46 1 0 1.9692013e-10 47 .9998856 0 .00011444603 48 .9945732 0 .005426852 49 1 1.401862e-24 1.7184303e-12 50 .9063449 0 .09365512 end
Code:
mata: mata clear void function draw() { gammaM = st_data(., ("c1","c2","c3")) gammaM = gammaM :/ rowsum(gammaM) n = rows(gammaM) guessclass = J(n,1,.) for(i = 1; i<= n; i++) { guesspr = gammaM[`i',] if (guesspr[1,1] = .) { guessclass[`i',1] = . } else { guessclass[`i',1] = rdiscrete(1,1,guesspr) } } } end
Note: some observations do have missing values for predicted probability (e.g. the first one in this sample). I'm trying to set those observations' random draw to missing. I suspect this is where the code is breaking. Also note, I'm aware that the vector of probabilities supplied to -rdiscrete- must sum to 1, and I've already ensured that by normalizing the probabilities (see the 2nd line of my function).
If you are interested: the context is trying to develop some validation tools for latent class models. The 3 probabilities are posterior probabilities of class membership from a 3-class model. I would like to randomly generate one random class assignment for repeated use as a start value (option startvalues(randomid guessid)) to verify that I've hit a global maximum likelihood. You can feed in the 3 posterior probabilities as start values, but you will simply replicate your solution many times; I want to do the equivalent of jittering the start values slightly.
Comment