Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining categories

    I would like to combine two categories for one of my variables but I am not sure whether there is a test I could run to justify combining the categories. Here is an example of what I am trying to do:

    Code:
    . ta worry
    
     How worried are |
     you about being |
       infected with |
           COVID-19? |      Freq.     Percent        Cum.
    -----------------+-----------------------------------
         Not at all  |        641       37.31       37.31
           A little  |        387       22.53       59.84
             Rather  |        203       11.82       71.65
               Very  |        487       28.35      100.00
    -----------------+-----------------------------------
               Total |      1,718      100.00
    
    . 
    . xtreg WB i.worry [pw= panel_ind_wt_1_2], fe
    
    Fixed-effects (within) regression               Number of obs     =      1,718
    Group variable: Findid                          Number of groups  =        859
    
    R-sq:                                           Obs per group:
         within  = 0.0191                                         min =          2
         between = 0.0388                                         avg =        2.0
         overall = 0.0214                                         max =          2
    
                                                    F(3,858)          =       2.25
    corr(u_i, Xb)  = 0.0312                         Prob > F          =     0.0811
    
                                   (Std. Err. adjusted for 859 clusters in Findid)
    ------------------------------------------------------------------------------
                 |               Robust
              WB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           worry |
      A little   |   .5341085   .3449488     1.55   0.122    -.1429337    1.211151
        Rather   |   .0941577   .3733732     0.25   0.801    -.6386741    .8269896
          Very   |    .688718   .2973494     2.32   0.021     .1051006    1.272335
                 |
           _cons |  -.4072548   .1753646    -2.32   0.020    -.7514485    -.063061
    -------------+----------------------------------------------------------------
         sigma_u |  1.5148842
         sigma_e |  1.8692139
             rho |  .39643111   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . 
    . recode worry 3=2 4=3
    (worry: 690 changes made)
    
    . 
    . ta worry
    
     How worried are |
     you about being |
       infected with |
           COVID-19? |      Freq.     Percent        Cum.
    -----------------+-----------------------------------
         Not at all  |        641       37.31       37.31
           A little  |        590       34.34       71.65
             Rather  |        487       28.35      100.00
    -----------------+-----------------------------------
               Total |      1,718      100.00
    
    . 
    . xtreg WB i.worry [pw= panel_ind_wt_1_2], fe
    
    Fixed-effects (within) regression               Number of obs     =      1,718
    Group variable: Findid                          Number of groups  =        859
    
    R-sq:                                           Obs per group:
         within  = 0.0146                                         min =          2
         between = 0.0451                                         avg =        2.0
         overall = 0.0238                                         max =          2
    
                                                    F(2,858)          =       2.69
    corr(u_i, Xb)  = 0.0556                         Prob > F          =     0.0684
    
                                   (Std. Err. adjusted for 859 clusters in Findid)
    ------------------------------------------------------------------------------
                 |               Robust
              WB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           worry |
      A little   |   .3706983    .310278     1.19   0.233    -.2382945    .9796911
        Rather   |   .6793746   .2977687     2.28   0.023     .0949343    1.263815
                 |
           _cons |  -.4024158   .1756929    -2.29   0.022    -.7472539   -.0575777
    -------------+----------------------------------------------------------------
         sigma_u |  1.5122795
         sigma_e |  1.8724047
             rho |  .39479255   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    .

    As categories 2/3 are not statistically significant, I would like to combine them. The results remain the same. As in is there a difference between categories 2 and 3? I am not sure if I am phrasing my question well..


  • #2
    You want to combine them because they're not statistically significant?

    Comment


    • #3
      Daria:
      welcome to this forum.
      Jared made a very good point: grouping categories only because they do not reach ststistical significance when considered as separate predictors has no methodological justification.
      In addition, I suspect that your data do not support the evidence of a panel-wise effect:
      1) your -corr(u_i, Xb) = 0.0312- is dramatically low (as the within R-sq);
      2) sigma_e>sigma_u.
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Thank you, Carlo and Jared. I apologise, I was not clear – not because they are statistically insignificant but mainly because I don’t see much difference between “rather worried” and “a little worried”, conceptual reasoning. But I am guessing that also does not justify it and better to keep categories as is.


        Carlo, the regression here excludes most of the other variables I include. However, my corr (u_i, Xb) is still small (-0.0112) and the within R-sq is 0.20 in the full model. I am new to Stata so does that mean I cannot run the fixed effects model on my data, or that the results aren’t very meaningful?

        Comment


        • #5
          Daria:
          1) I would keep the two levels “rather worried” and “a little worried” of -worry- categorical variable separate, as they have a non-negligible number of observations each;
          2) -corr (u_i, Xb) is still small (-0.0112)- supports the absence of evidence of a panel-wise effect; within R-sq is 0.20 is not that encouraging, too.
          Are you sure that your model is not misspecified (put differently: is the functional form of the regressand correct)? Are you sure that all the necessary predictors and interactions were included in the right-hand side of your regression equation to give a fair and true view of the data generating process you're investigating?
          I would recommend you to share what you typed and what Stata geve you back when you dealt with your full model. Thanks.
          Last edited by Carlo Lazzaro; 31 Jan 2022, 06:31.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Thanks Carlo.

            Here is the (1) full model and (2) run on two subsamples (urban/rural areas). If I understand I can only include time-varying predictors.

            Code:
            xtreg WB i.worry i.worry2 i.security i.employment i.income_change [pw= panel_ind_wt_1_2], fe
            
            Fixed-effects (within) regression               Number of obs     =      1,720
            Group variable: Findid                          Number of groups  =        863
            
            R-sq:                                           Obs per group:
                 within  = 0.1189                                         min =          1
                 between = 0.1263                                         avg =        2.0
                 overall = 0.0983                                         max =          2
            
                                                            F(12,862)         =       3.99
            corr(u_i, Xb)  = -0.0817                        Prob > F          =     0.0000
            
                                                  (Std. Err. adjusted for 863 clusters in Findid)
            -------------------------------------------------------------------------------------
                                |               Robust
                             WB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            --------------------+----------------------------------------------------------------
                          worry |
                     A little   |   .3973488   .3345473     1.19   0.235    -.2592738    1.053971
                       Rather   |  -.0733212   .3574259    -0.21   0.838    -.7748481    .6282058
                         Very   |    .349525   .2923644     1.20   0.232    -.2243045    .9233545
                                |
                         worry2 |
                      A little  |  -.1780035    .364739    -0.49   0.626    -.8938839    .5378768
                        Rather  |   .0579736   .3533665     0.16   0.870     -.635586    .7515331
                          Very  |   .5641763   .2816948     2.00   0.046     .0112883    1.117064
                                |
                       security |
                      Moderate  |    .901208   .2586017     3.48   0.001     .3936454    1.408771
                           Low  |   .9134565   .3042455     3.00   0.003     .3163078    1.510605
                                |
                     employment |
                    Unemployed  |   .3395662   .3472676     0.98   0.328    -.3420228    1.021155
            Out of labor force  |   .1233692   .4261581     0.29   0.772    -.7130597    .9597981
                                |
                  income_change |
                          Same  |  -.7486318   .2950736    -2.54   0.011    -1.327779    -.169485
                    Increased   |  -.7543419   .4195771    -1.80   0.073    -1.577854    .0691704
                                |
                          _cons |  -.9616271   .3645323    -2.64   0.008    -1.677102   -.2461523
            --------------------+----------------------------------------------------------------
                        sigma_u |  1.4555533
                        sigma_e |  1.7793041
                            rho |  .40091058   (fraction of variance due to u_i)
            -------------------------------------------------------------------------------------
            
            . xtreg WB i.worry i.worry2 i.security i.employment i.income_change [pw= panel_ind_wt_1_2] if urban==1, fe
            
            Fixed-effects (within) regression               Number of obs     =      1,058
            Group variable: Findid                          Number of groups  =        531
            
            R-sq:                                           Obs per group:
                 within  = 0.1335                                         min =          1
                 between = 0.1333                                         avg =        2.0
                 overall = 0.1069                                         max =          2
            
                                                            F(12,530)         =       2.97
            corr(u_i, Xb)  = -0.1364                        Prob > F          =     0.0005
            
                                                  (Std. Err. adjusted for 531 clusters in Findid)
            -------------------------------------------------------------------------------------
                                |               Robust
                             WB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            --------------------+----------------------------------------------------------------
                          worry |
                     A little   |   -.071595   .4542707    -0.16   0.875    -.9639871     .820797
                       Rather   |  -.0551513   .4471418    -0.12   0.902     -.933539    .8232363
                         Very   |  -.0349684   .3849639    -0.09   0.928    -.7912107    .7212739
                                |
                         worry2 |
                      A little  |   .0180118   .4520092     0.04   0.968    -.8699377    .9059612
                        Rather  |   .1315093   .4307555     0.31   0.760    -.7146883    .9777069
                          Very  |     .73125   .3425631     2.13   0.033     .0583019    1.404198
                                |
                       security |
                      Moderate  |   .9168551   .3300422     2.78   0.006     .2685037    1.565207
                           Low  |   .8619262   .4042321     2.13   0.033     .0678324     1.65602
                                |
                     employment |
                    Unemployed  |   .6122631   .5107989     1.20   0.231    -.3911758    1.615702
            Out of labor force  |   .4533399   .5879297     0.77   0.441    -.7016185    1.608298
                                |
                  income_change |
                          Same  |  -1.042022   .3677464    -2.83   0.005    -1.764442    -.319603
                    Increased   |  -.9039354   .5179174    -1.75   0.082    -1.921358    .1134875
                                |
                          _cons |  -.8637337   .4470105    -1.93   0.054    -1.741863    .0143961
            --------------------+----------------------------------------------------------------
                        sigma_u |  1.4705306
                        sigma_e |  1.8264579
                            rho |  .39328839   (fraction of variance due to u_i)
            -------------------------------------------------------------------------------------
            
            . xtreg WB i.worry i.worry2 i.security i.employment i.income_change [pw= panel_ind_wt_1_2] if urban==2, fe
            
            Fixed-effects (within) regression               Number of obs     =        662
            Group variable: Findid                          Number of groups  =        332
            
            R-sq:                                           Obs per group:
                 within  = 0.2171                                         min =          1
                 between = 0.0339                                         avg =        2.0
                 overall = 0.0549                                         max =          2
            
                                                            F(12,331)         =       4.00
            corr(u_i, Xb)  = -0.2380                        Prob > F          =     0.0000
            
                                                  (Std. Err. adjusted for 332 clusters in Findid)
            -------------------------------------------------------------------------------------
                                |               Robust
                             WB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            --------------------+----------------------------------------------------------------
                          worry |
                     A little   |   1.412041   .3319679     4.25   0.000     .7590083    2.065074
                       Rather   |  -.0654846   .4504375    -0.15   0.884    -.9515657    .8205966
                         Very   |   1.200007   .3787382     3.17   0.002       .45497    1.945045
                                |
                         worry2 |
                      A little  |  -.2452824   .4828167    -0.51   0.612    -1.195058    .7044937
                        Rather  |   .0413419   .5635478     0.07   0.942    -1.067245    1.149929
                          Very  |   .3929816   .4538273     0.87   0.387    -.4997678    1.285731
                                |
                       security |
                      Moderate  |   .8459582   .3064525     2.76   0.006     .2431181    1.448798
                           Low  |   1.225383    .363332     3.37   0.001     .5106518    1.940114
                                |
                     employment |
                    Unemployed  |   .0732419    .410372     0.18   0.858    -.7340241    .8805079
            Out of labor force  |  -.4721134   .3913379    -1.21   0.229    -1.241936    .2977097
                                |
                  income_change |
                          Same  |   .0561762   .2956212     0.19   0.849    -.5253571    .6377096
                    Increased   |  -.3017318   .5034221    -0.60   0.549    -1.292042    .6885784
                                |
                          _cons |  -1.613676   .5644371    -2.86   0.005    -2.724012   -.5033397
            --------------------+----------------------------------------------------------------
                        sigma_u |  1.5786666
                        sigma_e |  1.5804861
                            rho |  .49942404   (fraction of variance due to u_i)
            -------------------------------------------------------------------------------------
            
            .

            Comment


            • #7
              Another thing I find strange is how you have only two observations maximum per group. I suppose this isn't exactly illegal, but typically we use FE estimators to (ostensibly) adjust for unobserved, time invariant but unit stable confounding, typically over multiple periods of time.

              Are you sure xtreg is the way to go here? you honestly could likely get away with just pooling this with normal OLS

              Comment


              • #8
                Yes, I only have two waves of data so two observations per person. I've tried running the pooled OLS, I think I have the code right, but R-Squared still seems low even with added covariates and no longer able to use pweights with xtreg.

                Code:
                xtset ID wave
                
                xtreg WB i.worry i.worry2 i.security i.employment i.income_change i.wave i.marital i.agecat i.educ i.urban i.sex, vce (cluster ID)
                
                Random-effects GLS regression                   Number of obs     =      1,720
                Group variable: ID                              Number of groups  =        863
                
                R-sq:                                           Obs per group:
                     within  = 0.0637                                         min =          1
                     between = 0.1661                                         avg =        2.0
                     overall = 0.1226                                         max =          2
                
                                                                Wald chi2(22)     =     234.55
                corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                
                                                          (Std. Err. adjusted for 863 clusters in ID)
                -------------------------------------------------------------------------------------
                                    |               Robust
                                 WB |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                --------------------+----------------------------------------------------------------
                              worry |
                         A little   |   .2709719   .1244742     2.18   0.029      .027007    .5149369
                           Rather   |   .2405991   .1575753     1.53   0.127    -.0682428    .5494409
                             Very   |   .2619095   .1278226     2.05   0.040     .0113818    .5124372
                                    |
                             worry2 |
                          A little  |   .0472099   .1682596     0.28   0.779    -.2825728    .3769926
                            Rather  |    .001488   .1800994     0.01   0.993    -.3515002    .3544763
                              Very  |   .5868517   .1475392     3.98   0.000     .2976801    .8760233
                                    |
                           security |
                          Moderate  |   .6379472   .1193082     5.35   0.000     .4041074     .871787
                               Low  |   .8828169    .130804     6.75   0.000     .6264458    1.139188
                                    |
                         employment |
                        Unemployed  |   .5433968   .1325648     4.10   0.000     .2835746     .803219
                Out of labor force  |   .1512903   .1270076     1.19   0.234    -.0976401    .4002207
                                    |
                      income_change |
                              Same  |  -.2670846   .1140901    -2.34   0.019     -.490697   -.0434721
                        Increased   |  -.3734626   .1925077    -1.94   0.052    -.7507707    .0038456
                                    |
                               wave |
                            Wave 2  |  -.0639719   .0850457    -0.75   0.452    -.2306584    .1027145
                                    |
                            marital |
                 Currently Married  |   .1672897   .1300435     1.29   0.198    -.0875908    .4221703
                  Widowed/divorced  |  -.0470623   .2268863    -0.21   0.836    -.4917512    .3976266
                                    |
                             agecat |
                             30-40  |  -.0737299   .1276281    -0.58   0.563    -.3238764    .1764167
                             41-64  |  -.2624763   .1379174    -1.90   0.057    -.5327896    .0078369
                                    |
                               educ |
                             Basic  |  -.1280346    .138925    -0.92   0.357    -.4003227    .1442535
                         Secondary  |  -.0411468   .1532769    -0.27   0.788     -.341564    .2592704
                  Higher education  |   .1037788   .1624885     0.64   0.523    -.2146927    .4222504
                                    |
                              urban |
                             rural  |  -.1553741   .1079372    -1.44   0.150    -.3669272     .056179
                                    |
                                sex |
                            Female  |   .2121433   .1199667     1.77   0.077    -.0229872    .4472737
                              _cons |  -.9983781   .2179307    -4.58   0.000    -1.425514   -.5712418
                --------------------+----------------------------------------------------------------
                            sigma_u |  .69735626
                            sigma_e |  1.7581332
                                rho |  .13594067   (fraction of variance due to u_i)
                -------------------------------------------------------------------------------------
                
                .

                Comment


                • #9
                  Daria:
                  your last example is a -xtreg,re- equation, not a pooled OLS.
                  Please, test, just out of curiosity, whether -xttest0- after -xtreg,re- supports the evidence of a panel-wise effect (I do not think so). Thanks.
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment


                  • #10
                    Thank you for your continued help Carlo. I have run xttest0 and the result is statistically significant (0.0001) - Which i understand means my model should not be estimated as a pooled OLS (regress , vce (cluster ID)) or that i reject null of no random effect?

                    Code:
                    Breusch and Pagan Lagrangian multiplier test for random effects
                    
                            WB[ID,t] = Xb + u[ID] + e[ID,t]
                    
                            Estimated results:
                                             |       Var     sd = sqrt(Var)
                                    ---------+-----------------------------
                                          WB |   4.026931       2.006721
                                           e |   3.091032       1.758133
                                           u |   .4863057       .6973563
                    
                            Test:   Var(u) = 0
                                                 chibar2(01) =    15.11
                                              Prob > chibar2 =   0.0001
                    So would I just be better off sticking with the FE model even with low within R squared?

                    Comment


                    • #11
                      Daria:
                      despite my (wrong) impression, -xttest0- outcome points you towards -xtreg,re-.
                      That said, I would take the following step:
                      1) check the functional form of the regressand (just replicate the procedure detailed under -linktest- entry, Stata .pdf manual);
                      2) type:
                      Code:
                      xi: xtreg WB i.worry i.worry2 i.security i.employment i.income_change i.wave i.marital i.agecat i.educ i.urban i.sex, vce (cluster ID)
                      xtoverid
                      As far as 2) is concerned, please note that:
                      a) the -xi:- prefix is required because the community-contrinuted module -xtoverid- does not suppor -fvvarlist- notation;
                      b) if you did not download -xtoverid- yet, just type -search xtoverid- to spot and install it (along with the other community-contributed modules that support -xtoverid-);
                      3) the -xtoverid- null is tht -xtreg,re- is the way to go.
                      Kind regards,
                      Carlo
                      (StataNow 18.5)

                      Comment


                      • #12
                        Thank you very much for your help Carlo. I got a P-value of 0.42 which means I can't reject the null and will go with the random effects model xtreg, re

                        Comment


                        • #13
                          Daria:
                          correct, provided that your model is correctly specified (as per my previous point 1).
                          Kind regards,
                          Carlo
                          (StataNow 18.5)

                          Comment

                          Working...
                          X