Dear Stata listers,
I am trying to compare the results of finite mixture models in Stata (command fmm) and R (package flexmix, see Leisch F. (2004): "FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R", Journal of Statistical Software, 11(8)).
It turns out the results obtained in Stata look very strange, while those obtained in R make much more sense.
I illustrate my point using a small simulated dataset. Here is the code:
We can see there is a wide discrepancy between the true classes and the predicted classes.
Graphing the true classes and the predicted classes show that the issue arises for the observations that lie after the intersection of the two data generating processes:


In my understanding, based on the FMM above, we should be able to allocate correctly most of the observations into the correct class. Here, this is clearly not the case. Am I mis-specifying the model? Am I missing something?
Any help would be much appreciated.
Sylvain
PS: for the sake of completeness, I attach the dofile that runs the entire analysis above and draws the two figures: Test_FMM_simul.do
I am trying to compare the results of finite mixture models in Stata (command fmm) and R (package flexmix, see Leisch F. (2004): "FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R", Journal of Statistical Software, 11(8)).
It turns out the results obtained in Stata look very strange, while those obtained in R make much more sense.
I illustrate my point using a small simulated dataset. Here is the code:
Code:
*Simulate some data similar to that used in Leisch F. (2004) clear all set obs 200 gen class = inrange(_n,1,100)*1 + inrange(_n,101,200)*2 set seed 12345 gen x = runiform(0,10) gen y = 5*x + rnormal(0,3) if class==1 replace y = 15 + 10*x - x^2 + rnormal(0,3) if class==2 *Run FMM and predict the classes fmm 2, emopts(iter(100)): regress y c.x##c.x predict classpr*, classposterior gen classpr = . forv i = 1/2 { replace classpr = `i' if classpr`i'==max(classpr1, classpr2) } *Compare true classes and predicted classes tab class classpr
Graphing the true classes and the predicted classes show that the issue arises for the observations that lie after the intersection of the two data generating processes:
In my understanding, based on the FMM above, we should be able to allocate correctly most of the observations into the correct class. Here, this is clearly not the case. Am I mis-specifying the model? Am I missing something?
Any help would be much appreciated.
Sylvain
PS: for the sake of completeness, I attach the dofile that runs the entire analysis above and draws the two figures: Test_FMM_simul.do
Comment