Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finite mixture modelling class posterior probability


    Hi Statalist,


    I am working to identify classes. Having identified the classes how do i please create a new variable that would capture the three classes together since each class has it's percentage??

    please see the commands i have used so far.

    Code:
     fmm 3: regress restrictions lngarch gdp_pc_growth inflation_defl pcrdbofgdp p_durable  exch_rate_flexibility publ_debt ethnic_tens_recod interaction
    
    estat lcprob

    Thank you.


  • #2
    David, it's not clear what you mean. Do you want each observation to have the predicted probability of being in each class, or do you want to assign each observation to the class they are most likely to belong to (modal class assignment)? It may not be immediately obvious, but the SEM examples 50 to 54, which deal with both latent class analysis and FMM (I believe LCA is a subset of FMM) all have examples about predicting the posterior probabilities.

    Code:
    predict class*, classposteriorpr
    Note that you should use a wildcard, because you get one variable for each latent class in the model (3 in your case).

    Example 50 shows how do modal class assignment with 2 classes. A more general formulation is below.

    Code:
    predict class*, classposteriorpr
    egen classpmax = rowmax(class*)
    gen classmodal = .
    forvalues i = 1/3 {
    replace classmodal = i if classmax == class`i' & class`i' != .
    }
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thanks Weiwen for your response.

      I want to assign each observation to the class they are most likely to belong to. Please see below my dataset and what i have done thus far.


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(restrictions lngarch gdp_pc_growth inflation_defl pcrdbofgdp) int p_durable byte exch_rate_flexibility float(publ_debt ethnic_tens_recod interaction)
      30.6925 -2.541947  2.27615  8.43351  62.42  22 1 47.3159 3.08333 -7.837661
      30.1364 -2.626751  .536661  4.97253  64.14  23 1 47.0484       4 -10.50701
      30.1364 -2.687057 -2.55128  2.40534  68.34  24 1 60.5368       4 -10.74823
      29.8584 -1.914592 -3.51337  8.84202  67.78  25 1 69.3295     3.5 -6.701071
      29.8584 -1.989074 -3.69984  9.06096  65.23  26 1 83.4305       3 -5.967223
      29.5804 -1.971082  1.65836  16.0114  59.14   0 1 78.4124       3 -5.913247
      29.5804 -1.917711 -1.74735  30.2596  51.14   1 1 69.0354       3 -5.753132
      34.8036 -1.926768 -3.60245  53.7886  39.97   2 1 71.6503     3.5  -6.74369
      35.0307 -2.113023 -.571638  21.9261   24.3   0 3 70.1078       4 -8.452092
      35.5358 -2.180589 -4.26403  13.6244   6.53   1 3  97.513  3.6666 -7.995347
      35.7629 -2.445567 -2.95139  29.0776   5.99   2 3 97.9692       3   -7.3367
        35.99 -2.533547  1.80754   28.577   5.14   0 3 94.3167       3  -7.60064
        35.99 -2.489469  2.25225  24.0219    4.8   1 3 79.1888       3 -7.468407
        35.99 -2.592309 -.566677  7.00196   4.41   2 3 75.8162       3 -7.776928
      35.7119 -2.737345  3.48032 -3.13109   4.15   3 3  79.778 3.83333 -10.49315
      35.4848 -2.443757  1.69584  10.8564   4.62   4 3 72.1982       5 -12.21879
      35.4848 -1.881069  .772159  24.5981   5.01   5 3 61.2543       5 -9.405347
      36.0409 -2.152761  3.21092   .71121    6.8   6 3 59.8002    4.85 -10.44089
      40.7356  -1.63683  4.22937  1.90633   9.65   7 3 59.4001       5 -8.184149
      42.4713 -2.051235  5.80179   8.3238  10.83   8 3 50.6477       5 -10.25618
      45.7743   -2.0855   2.8727  10.6293  10.34   0 3 39.3408       5  -10.4275
       48.342 -2.395907  4.34223  16.4593  10.39   1 3 28.5524 3.58333 -8.585326
      51.5067   -2.3164  .089907  11.2828  11.31   2 3  24.303     3.5   -8.1074
      46.8922 -2.160582  1.66229  7.33106  12.05   3 3 13.6137     3.5 -7.562037
      45.1373 -2.422822  .206035  14.6022  12.17   4 . 8.05828     3.5 -8.479878
      48.1785 -2.411909 -.139057 -11.2666  15.47   5 . 10.4196     3.5 -8.441682
      40.2261 -2.140934  1.70057  15.7647   14.6   6 . 10.9348     3.5 -7.493269
      32.3336 -2.130289  .697751  17.5137   14.2   7 . 9.53344     3.5  -7.45601
       32.632 -2.956185   .66881  606.736  12.51   1 3 40.4905       1 -2.956185
       32.632 -3.270459 -8.96907  625.802  12.68   2 1 60.4651       1 -3.270459
        32.91 -3.064054  6.27826  74.4615  12.59   3 3 55.7611       1 -3.064054
       33.188 -3.158746  1.40166  127.086  12.95   4 3 72.5569       1 -3.158746
      33.4661 -3.390851 -3.96593  388.491  12.04   5 3 59.7437       1 -3.390851
      33.7441 -2.989452 -8.81101  3057.63   13.5   6 4       .       1 -2.989452
      34.0221 -2.819479 -3.75855  2076.79  12.87   7 4 55.8164       1 -2.819479
      41.2658 -3.145904  11.1357  132.953   10.5   8 1 45.6228       1 -3.145904
      48.7875 -3.187111  10.4527  11.9208  12.91   9 1 38.4511       1 -3.187111
      54.4415  -3.39448  4.53126 -1.46666   16.7  10 1 35.9441       1  -3.39448
      59.5671 -3.498854  4.49047  2.84934  18.49  11 1 28.8917       1 -3.498854
      64.3912 -3.124066 -4.05521  3.16512   20.2  12 1 33.6909       1 -3.124066
      62.9003 -3.378781  4.23723 -.052375  19.52  13 1 35.5732       1 -3.378781
      61.6962 -3.053722   6.8185 -.463913  20.38  14 1 35.0798       1 -3.053722
      59.5328  -3.08765  2.64639 -1.70528  22.95  15 1 38.1807       1  -3.08765
      56.6284 -3.317261 -4.45763 -1.83656  25.17  16 1 43.4916       1 -3.317261
      53.9843 -3.197383 -1.83347  1.03729  24.45  17 1 45.6208       1 -3.197383
      48.3829 -3.412157 -5.35849 -1.09577  23.14  18 1 53.6213       1 -3.412157
      43.1524 -3.320659 -11.7332  30.5552  18.34  19 3 164.992       1 -3.320659
      39.2448 -3.432564  7.85426  10.4957  11.86  20 3 139.447       1 -3.432564
      40.9762 -3.379456  8.06655  9.22063   9.77  21 3 127.033       1 -3.379456
      34.8276 -3.332549  8.22107  8.84048  10.25  22 3 87.1197       1 -3.332549
      36.4042  -3.11781  7.51722  13.4262  11.25  23 3 76.4438       1  -3.11781
      37.3988 -3.176176        .        .  12.43  24 3 67.0905       1 -3.176176
       37.349 -2.750812        .        .  12.56  25 . 58.5115       1 -2.750812
      31.3042 -3.072549        .        .  12.92  26 . 58.7011       1 -3.072549
      30.3958 -3.290264        .        .   12.7  27 . 49.1778       1 -3.290264
      28.8207 -3.069982        .        .  13.96  28 . 44.9389       1 -3.069982
      60.8489 -3.058395  3.56686  7.83786  26.36  83 4 20.3591  1.4166 -4.332522
      62.2624 -3.082009  3.63823  4.84354  30.54  84 4 21.9957       2 -6.164019
      63.8011 -3.031173  2.88661  5.64588  35.75  85 4 23.4131       2 -6.062346
      64.7095 -2.384251  1.08244  6.93272  37.62  86 4 22.0637       2 -4.768502
      65.3398 -2.002103  3.93409  7.59652  39.79  87 4 24.8187       2 -4.004206
      66.7024 -1.959066  2.17567  9.13819   47.7  88 4   22.77       2 -3.918133
      67.1057 -2.008104  2.06417    6.132  56.18  89 4 21.5777       2 -4.016207
       68.672 -2.388503 -1.61296  3.04843  59.33  90 4 22.9141       2 -4.777005
      70.6182 -2.447114 -.782985  1.44476  59.15  91 4 27.1514  1.6666  -4.07836
      72.2354 -2.720892  3.09724  .844289  59.96  92 4 30.3218       1 -2.720892
      74.1816 -2.948734  2.95292  1.08352  61.58  93 4 31.4529 1.08333 -3.194452
      75.5717 -2.980844  2.74101  2.11451  64.95  94 4 30.8894       2 -5.961688
      76.1631 -3.091033  2.63623  2.60648  66.87  95 4 29.1139       2 -6.182065
      78.0428 -2.797406  2.74469  1.20201  69.52  96 4 25.7872  2.1666  -6.06086
      78.6851 -2.799787  3.44041  1.23061  73.22  97 4 23.7045       4 -11.19915
      79.7816 -2.886698  3.76209  .481882  77.41  98 4 22.5248 3.33333 -9.622318
      80.6231 -2.321727  2.61525  2.57333  80.78  99 4 19.4585       3 -6.965181
      80.9623 -2.662379  .534732  4.78538  83.04 100 4 17.0408       3 -7.987138
      77.3889 -2.299779  2.65249  2.77602  84.73 101 4 14.9539  3.2916 -7.569952
      78.5067  -2.41056  1.88585  2.85082  90.24 102 4  13.151     3.5  -8.43696
      79.6751 -2.736928    2.947  3.05465  94.96 103 4   11.89     3.5 -9.579248
      79.5407 -2.825804  1.83397  3.82608  98.67 104 4 10.8282     3.5 -9.890313
      80.9069 -2.598366  1.53378  4.84475 104.68 105 4  10.008  3.0416 -7.903192
      81.3876  -2.69329  2.21753  4.90441 115.55 106 4 9.70977       3 -8.079869
      74.3361 -2.878945  1.98076  4.55366  127.9 107 . 11.7852       3 -8.636836
      79.7219 -2.991521 -.192623   4.9888 134.69 108 .  16.855       3 -8.974564
      81.2712 -3.072138  .762052  .924474 130.06 109 . 20.5208       3 -9.216415
      84.2411 -3.077029  1.24662  6.05309 129.24 110 . 24.2129       3 -9.231087
      66.4099 -3.295271  .057574  4.94977  73.28  38 1 42.3497       1 -3.295271
       67.244 -3.314286  2.45057    2.963  75.61  39 1 45.3562       1 -3.314286
      70.9284 -3.472801   2.2364    2.951  77.33  40 1 48.2795  1.6666  -5.78777
      72.1658  -2.59024  1.29305  2.43695  77.08  41 1 54.5726 1.83333 -4.748765
      74.2606 -2.577957  3.14975  1.52685  79.63  42 1 58.8844       1 -2.577957
      76.1283 -2.876564   3.4201  2.96476  82.74  43 1  58.928       1 -2.876564
      77.5418 -2.846244  3.55355  2.99968  85.69  44 1 56.2016       1 -2.846244
      79.6366 -3.152503  2.41399  3.64187  87.51  45 1 56.4039       1 -3.152503
      81.3281 -3.225598  .976092  3.47909  88.74  46 1 56.3707       1 -3.225598
      83.2977 -3.428764 -.298754  2.75708  90.46  47 2 60.8955       1 -3.428764
      85.7723 -3.480923  2.00876  2.52494  89.21  48 2 64.1885    1.25 -4.351153
      87.4639 -3.583043  2.51091  1.81392  90.75  49 2 68.3944       2 -7.166086
      90.9026 -3.393259  2.32839  .811976  93.46  50 2 68.3292       2 -6.786518
      93.0387 -3.425838  2.19304 -.220902  98.61  51 2 64.3538       2 -6.851677
      94.4376 -3.368392  3.67172  .320326      .  52 2  64.776       2 -6.736783
      94.9898 -3.311814  3.33787  .283421      .  53 1  67.207  2.9166 -9.659236
      end
      
      fmm 3: regress restrictions lngarch gdp_pc_growth inflation_defl pcrdbofgdp p_durable  exch_rate_flexibility publ_debt ethnic_tens_recod interaction
      
      estat lcprob

      I want to present the classification of each observation into the classes including the estimated posterior probability membership.

      Thank you very much for your time.













      Last edited by David Osey; 17 Jun 2018, 11:20.

      Comment


      • #4
        Originally posted by David Osey View Post
        Thanks Weiwen for your response.
        ...
        I want to assign each observation to the class they are most likely to belong to. Please see below my dataset and what i have done thus far.
        I already gave you code for that. It's the second block of code in my initial reply. It has some typos, so if you tried this and it failed, this code is correct:

        Code:
        predict class*, classposteriorpr
        egen classpmax = rowmax(class*)
        gen classmodal = .
        forvalues i = 1/3 {
        replace classmodal = `i' if classpmax == class`i' & class`i' != .
        }
        Explaining it line by line:

        1. Predict posterior probabilities for each observation. Stata takes the stubname class* and will generate variables called class1, class2, class3, etc.

        2. Generate a variable containing the maximum value of class* (i.e. all variables that start with class; here, I assume you only have class1, class2, class3, so this contains the probability of the most likely class)

        3. Generate a blank variable called classmodal.

        4. Loop through all 3 classes. Replace the variable classmodal with the value of the class if, for example (say i = 1), class1 == classmax and class1 is not missing.

        You said you want to

        present the classification of each observation into the classes including the estimated posterior probability membership.
        I'm not quite sure what you mean by this. I assumed you just wanted to calculate the modal class membership for each observation (i.e. what class they are most likely to belong to). -estat lcprob- presents the proportions of each class in the sample. I suspect that will be enough for most journals, without you having to calculate posterior probabilities of class membership.

        I don't mean to confuse the issue, but if you want to use the modal class membership to do further analysis, then this is not quite a correct approach, in that you don't know for sure which class each observation belongs to. If you were to do modal class assignment, then tabulate the modal class, you'd see that the probabilities differ a bit from -estat lcprob-. For example:

        Code:
        estat lcprob
        --------------------------------------------------------------
                     |            Delta-method
                     |     Margin   Std. Err.     [95% Conf. Interval]
        -------------+------------------------------------------------
               Class |
                  1  |   .2840582   .0508619      .1955096    .3931144
                  2  |   .3294013    .053902        .23341    .4421027
                  3  |   .3865405   .0549385      .2857798    .4980536
        --------------------------------------------------------------
        
        tab classmodal
         classmodal |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  1 |         24       28.57       28.57
                  2 |         26       30.95       59.52
                  3 |         34       40.48      100.00
        ------------+-----------------------------------
              Total |         84      100.00
        However, in many applications, it may suffice. If your the latent class entropy, which is a statistic that varies from 0 to 1, is above 0.8, that means the latent classes are pretty clearly separated. If you want to calculate entropy, see this post (that one has latent class analysis in the title, but the same principle and commands would apply to your case).

        If this is not quite what you want to do, can you explain further? If you can't easily describe what you want in English, try giving an example of a table or graph in a paper and we shall see what can be done (best to link to the article in question, and it's also helpful to include a complete citation).
        Last edited by Weiwen Ng; 17 Jun 2018, 13:07.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Thank you again for your response.

          I have run the commands you gave to me.


          Code:
           
          
          . 
          . predict class*, classposteriorpr
          (1119 missing values generated)
          
          . 
          . egen classpmax = rowmax(class*)
          (1119 missing values generated)
          
          . 
          . gen classmodal = .
          (2,716 missing values generated)
          
          . 
          . forvalues i = 1/3 {
            2. 
          . replace classmodal = `i' if classpmax == class`i' & class`i' != .
            3. 
          . }
          (769 real changes made)
          (448 real changes made)
          (380 real changes made)
          
          
          estat lcprob
          
          Latent class marginal probabilities             Number of obs     =      1,597
          
          --------------------------------------------------------------
                       |            Delta-method
                       |     Margin   Std. Err.     [95% Conf. Interval]
          -------------+------------------------------------------------
                 Class |
                    1  |   .5113294   .0282263      .4560938    .5662896
                    2  |   .2785512   .0278676      .2273295     .336291
                    3  |   .2101194   .0172583      .1782847    .2459373
          --------------------------------------------------------------
          
          . di  .5113294*1597
          816.59305
          
          . di  .2785512 *1597
          444.84627
          
          . di   .2101194  *1597
          335.56068
          
          . tab classmodal
          
           classmodal |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    1 |        769       48.15       48.15
                    2 |        448       28.05       76.21
                    3 |        380       23.79      100.00
          ------------+-----------------------------------
                Total |      1,597      100.00


          I have now estimated your commands. The challenge i am having is that, when i tab the classmodal, the number of observations in all classes are either over or short. For instance under classmodal in ''class 1'' the number of observation is 769 which is short by 48 observations when i compare it to the marginal probabilities obtained when i use the
          Code:
           estat lcprob
          command and in ''class 3'' the number of observations is 380 which is 45 observations over. Please my question is, is there anyway to correct this discrepancies in order for all respective classes regardless of the method used to have same number of observations?


          Secondly, please find below the title of the paper i am trying to follow which might give you a clearer understanding of what i seek in the second stage of my quest.

          Please see page number 765 , Table 6. Since i am also working with countries, i intend to assign to each country the probability of belonging to/in one class over the other.

          ''Konte, M. (2017). Do remittances not promote growth? A finite mixture-of-regressions approach. Empirical Economics, 1-36''


          Thank you.




          Comment


          • #6
            Originally posted by David Osey View Post

            ...

            Code:
            ...
            estat lcprob
            
            Latent class marginal probabilities Number of obs = 1,597
            
            --------------------------------------------------------------
            | Delta-method
            | Margin Std. Err. [95% Conf. Interval]
            -------------+------------------------------------------------
            Class |
            1 | .5113294 .0282263 .4560938 .5662896
            2 | .2785512 .0278676 .2273295 .336291
            3 | .2101194 .0172583 .1782847 .2459373
            --------------------------------------------------------------
            
            ...
            
            . tab classmodal
            
            classmodal | Freq. Percent Cum.
            ------------+-----------------------------------
            1 | 769 48.15 48.15
            2 | 448 28.05 76.21
            3 | 380 23.79 100.00
            ------------+-----------------------------------
            Total | 1,597 100.00


            I have now estimated your commands. The challenge i am having is that, when i tab the classmodal, the number of observations in all classes are either over or short. For instance under classmodal in ''class 1'' the number of observation is 769 which is short by 48 observations when i compare it to the marginal probabilities obtained when i use the
            Code:
             estat lcprob
            command and in ''class 3'' the number of observations is 380 which is 45 observations over. Please my question is, is there anyway to correct this discrepancies in order for all respective classes regardless of the method used to have same number of observations?

            ...
            This is probably one of the harder things about latent class/finite mixture modeling to grasp. Let's back up a bit.

            Imagine that we have data on human heights and weight, but for some reason, we failed to record gender. We suspect that human height is a function of weight, and that human height can be described by a mixture of 2 OLS regressions with weight as the independent variable. We suspect that the latent classes will come down along the lines of gender.

            Imagine you know that someone is 5 feet 5 inches, or 165 cm. Chances are very good that person is female. If you were to run an FMM on that data, and you predicted the posterior probabilities, someone like that might have (depending on their weight) a posterior probability of maybe 60 to 90% for being in the female latent class. But you don't know that for certain. In fact, I'm male and I'm 5'5". The same holds true if someone were 6 feet 6 inches / 198cm. There is a very high chance that they're male, but it's not guaranteed.

            Conversely, say someone is exactly at the mean height for humans in general. The model will not be very sure what their latent class is. Or, in the paper you cited, Venezuela is in class 1, but the model is not very certain about that, because the probability is only 0.54. El Salvador has a 0.56 probability of being in class 2.

            Hence, if you do modal class assignment, some people will be wrongly assigned. If you then go tabulate the modal class variable, you will inevitably get results that diverge a bit from -estat lcprob-, which is the model-based proportion for each class. I tried to convey this in my last post, but it is one of the tricker concepts to grasp.

            If you were to type something like,

            Code:
            mean class1
            I bet that you would get the same probability of being in class 1 as given by -estat lcprob-, because instead of treating everyone as definitely class 1 or not (which, as I said earlier, is wrong), you are treating everyone as being probabilistically a member of that class. I definitely got the correct probabilities in your example data when I ran that command.

            For that paper's table 6, they are listing countries by modal class, and reporting the probability of being in the modal class. Something like this could work:

            Code:
            list country class1 if classmodal == 1
            list country class2 if classmodal == 2
            Then you can copy and paste. There's probably some better way to do it, but I can't think of one this second.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              Thank you very much Weiwen. I will stick with the results i am getting from the classmodal. I do not think slight divergence would really hurt the work since it is not treating everyone as definitely being in a particular class but rather the probability of being in that specific class.

              For the moment i will list the countries by class modal as you have stated. As you suggested ''there's probably some better way to do it, but I can't think of one this second'' , please do let me know if you find it.

              Thank you again for all your time.
              Last edited by David Osey; 18 Jun 2018, 04:26.

              Comment


              • #8
                Originally posted by David Osey View Post
                ... I will stick with the results i am getting from the classmodal. I do not think slight divergence would really hurt the work since it is not treating everyone as definitely being in a particular class but rather the probability of being in that specific class.

                ...
                Just to clarify, it is actually the reverse. When you use modal class assignment, you assume that a country is definitely in the class you assigned it to.

                If you get a similar degree of class separation as the paper you cited, this is arguably OK. But even there, there are some countries where the modal class probability is quite low (e.g. El Salvador, 56% probability of being in class 2).

                But, I assume you will present results from -estat lcprob- elsewhere. Those are the model-based class probabilities.

                A better way to reproduce table 6 would be this:

                Code:
                preserve
                keep country class1 class2 class3 classmodal
                sort classmodal country
                export excel ...
                restore
                You can use the export excel dropdown menu (in the file menu), or you can search for help with the -export excel- command. -preserve- will preserve your data in its current form. No matter what you do to your dataset in the interim, it will be restored when you type -restore-.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment


                • #9
                  Thank you again for the further clarification. I will revert to the platform should i have any issues.

                  Thanks Weiwen.

                  Comment

                  Working...
                  X