Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between sigma_u in Random- and Fixed-Effects-Modells. Interpreting sigma_u & sigma_e in Random Effects Modells.

    Hi together,

    I am struggling with a pretty pragmatic problem. Like the heading is indicating I am doing longitudinal analysis in STATA. Therefore I need to model a fixed-effects-modell and a random-effects-modell an compare them.

    You can find the results listet below.The depend Variable is the job prestige of italian and turkish migrants in germany and the independent variables are the speaking and the writing skills (correlation >.60) in another modell I also add the age at immigration as a timeinvariant variable.

    Now I have to compare these two modells, which is okay, but there is point which is overhelming me: the sigma values in the random-effects-modell and the comparisn of sigma_u between random- and fixed-effects.

    I think that rho in context of the random-effects-modell indicates the estimated proportion of the between-variance at the total variance. It is calculated like this: sigma_u/sigma_u+ sigma_e

    So sigma_u in the random-effects-modell has to be the between variance. But what is sigma_u in the fixed-effects-modell? And what is sigma_e in the random effects modell? Or how do I interpret them?
    Click image for larger version

Name:	panel_modells.png
Views:	1
Size:	74.3 KB
ID:	1357793



    Another smaller problem is, that the Fu test, the corr(u_i,Xb) and the Hausman test indicate that I have to use the fixed-effects-modell. But actually its not significant. Could a conclusion be that there are timeinvariant variables missing in the modell, so that the fixed-effects-modell is not significant but reliable while the random-effects-modell is significant but distorted?

    Best wishes, Marcel


    Here is my stata code and output ('sprech' refers to speaking skills, and 'schreib' to writing skills):

    Code:
     xtreg magni sprech schreib, fe
    
    Fixed-effects (within) regression               Number of obs     =      7,101
    Group variable: persnr                          Number of groups  =      1,671
    
    R-sq:                                           Obs per group:
         within  = 0.0009                                         min =          1
         between = 0.1261                                         avg =        4.2
         overall = 0.0886                                         max =         12
    
                                                    F(2,5428)         =       2.35
    corr(u_i, Xb)  = 0.3095                         Prob > F          =     0.0957
    
    ------------------------------------------------------------------------------
           magni |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          sprech |  -.0348806   .1858644    -0.19   0.851    -.3992494    .3294882
         schreib |   .3200283   .1595278     2.01   0.045     .0072898    .6327669
           _cons |   39.60548    .596475    66.40   0.000     38.43615    40.77481
    -------------+----------------------------------------------------------------
         sigma_u |  12.114286
         sigma_e |  7.0960819
             rho |  .74453704   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(1670, 5428) = 9.38                  Prob > F = 0.0000
    
    
    
    xtreg magni sprech schreib, re
    
    Random-effects GLS regression                   Number of obs     =      7,101
    Group variable: persnr                          Number of groups  =      1,671
    
    R-sq:                                           Obs per group:
         within  = 0.0008                                         min =          1
         between = 0.1296                                         avg =        4.2
         overall = 0.0910                                         max =         12
    
                                                    Wald chi2(2)      =     103.07
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
           magni |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          sprech |   .3574049   .1764036     2.03   0.043     .0116602    .7031495
         schreib |   1.057315   .1459013     7.25   0.000     .7713538    1.343276
           _cons |   36.68076   .5851259    62.69   0.000     35.53394    37.82759
    -------------+----------------------------------------------------------------
         sigma_u |  10.400567
         sigma_e |  7.0960819
             rho |  .68235919   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    Last edited by Marcel Gehrke; 23 Sep 2016, 07:12.

  • #2
    Hi, my understanding is that simga_u is the estimate of the between subject standard deviation, and that sigma_e is the estimate of the within subject standard deviation.

    With respect to the Hausman test, if you can't reject the null hypothesis both estimators (fixed and random) are consistent, but random effects is efficient. How can you know which of the two is distorted? Do you know the values of the population parameters?

    f you can reject the null hypothesis then only the fixed effects estimator is consistent. It's surprising that the sign on speaking skills (sprech) varies depending on the estimation method. True that in the fixed effects estimation it's not individually significant, but still. If the between and within effects estimators are equal then the sign should be the same, at least. It's an indication that in your case the between estimation and the within estimation would yield significantly different results. Remember that random effects is really a weighted average of the within and between estimators. In this case, usually the null hypothesis of the Hausman test is rejected because the difference between the coefficients is statistically significant. Are you sure you can't reject the null hypothesis that both estimates are equal?
    Alfonso Sanchez-Penalver

    Comment


    • #3
      Another observation about your results. They indicate that in your case the within variation doesn't explain much of the variation of the dependent variable. This could be because either for each person the dependent variable doesn't vary much while the independent ones vary somewhat, or because the dependent variable varies somewhat while the independent variables don't vary so much. In your case, it seems that for each given individual either there is not much change in the migrant's job prestige, or in their speaking and reading skills. However, there is variation in these variables across (between) migrants in these variables. This is reflected in the R-squared and comparing the test of joint significance of both estimations. The R-squared of within variation is around 0.0009 and the one for between variation is around 0.13. Also the test of joint significance in the fixed effects (within) is very weak, and the one for the random effects is strongly significant. If you really can't reject the Hausman test, it seems that random effects may be doing a better job, because of the between variation. But I'll still be surprised to see that you don't reject the Hausman test.
      Alfonso Sanchez-Penalver

      Comment


      • #4
        Marcel:
        as an aside to Alfonso's helpful remarks, I would consider the sign-flipping in -sprech- coefficient as a possible sign of quasi-extreme multicollinearity. Did you perform -estat vce, corr- after -xtreg-?
        Another remark may consider the risk of endogeneity in your model specification: are you sure that, say, personal ability cannot influence both the dependent variable and the predictors?
        Eventually, you wrote sometimes -modell- and sometimes -model- (it may well be that -modell- is the German word for the English -model-).
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Dear Alfonso,
          dear Carlo,

          thank you so much for your kind and helpful remarks!

          @ Alfonso
          I am sorry, but I think I confused you with my sentences. I wanted to say, that I have to reject the Hausman-test, but that the FEM as a whole model is not significant. I am sorry about that.

          Actually I thougt that the values of the REM have to be influenced by other unobserved variables when they are higher than in the FEM. For example they are increasing in the REM when I add the age at immigration. And you are right, the dependent variable does not vary much by time.


          @Alfonso
          I never heard about "estat vce, corr" before. But below you can find the results. Is there some kind of a common treshold which I should consider?
          Code:
          . estat vce, corr
          
          Correlation matrix of coefficients of xtreg model
          
                  e(V) |  schreib    sprech     _cons 
          -------------+------------------------------
               schreib |   1.0000                     
                sprech |  -0.4567    1.0000           
                 _cons |  -0.2247   -0.7552    1.0000
          And you are right, I am not sure that there is no endogeneity and I think that it is not unrealistic that the model is biased by endogeneity.


          According to the sigma_u problem I found something in literature (Giesselmann & Windzio 2012: 102 [in german]):
          • - sigma_u in FEM: mean variation of person-specific means. It's constant. In this case the mean difference of the unit specific mean from the total mean of the job prestige is 12.114 units.
          • - sigma_u in REM: variation of the residual unit effects (=the not through independent variables explained difference between the unit specific means). Gets smaller when useful independent variables were added to the model.
          So my thought would be, that I can not really compare them, but sigma_u in the REM should get smaller when variables are added? Does this match with your knowdedge?

          P.S. I am sorry about the model(l) confusion. Off course I meant model all the time.

          Kind regards,
          Marcel

          Comment


          • #6
            marcel:
            thanks for providing further details.
            _cons and sprech seems quite highly (inverse) correlated.
            However, some problems might also come from the limited set of predictors that you plugged in the right-hand side of the equation.
            As an aside, .-xtreg- entry in Stata .pdf manual covers the "sigma issue" for -fe- and -re- specifications.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Hi, to add to Carlo's useful comment on the correlation, the high correlation with the constant is a clear indication of the lack of variation in sprech that we discussed. I'm assuming that the correlations you're showing are of the fixed effects model, which is the one that has less variation.

              I'm more inclined to think that the sign issue in sprech is due to a bias because of endogeneity than to multicollinearity. We would expect that improving your language ability would improve job prestige, so we would expect the coefficient to be positive. This is captured by the random effects estimation because of the wide variation of the between effect, which sort of ``hides'' the bias, but it clearly shows in the fixed effects model, because of the lack of variation in the independent variable.

              One thing I would be curious about is whether there are some nationality effects. You mentioned that the workers are from two nations: Turkey and Italy. Are you controlling for a different intercept for both, i.e. including a dummy variable for one of the nationalities? You may also consider a different slope on the two explanatory variables to see whether improvements in either speaking or writing abilities affect each national differently. So interact the nationality dummy with the explanatory variables to see what happens. Are you also controlling for time effects, i.e. including years as a categorical explanatory variable? Just trying to think of things that may help capture effects that are correlated with the explanatory variables.

              The national dummy is time invariant, but not the interactions. So the interactions would be good in the fixed effects model and the dummy and the interactions in the random effects model. Having said all this, I wonder if you're better off running a cross-sectional estimation since the time variation is not really adding much. The inconsistency of the random effects model is due to the between effects being correlated with the unobserved variable. This would still occur in the cross-sectional model, and you would have to deal with it.
              Last edited by Alfonso Sánchez-Peñalver; 26 Sep 2016, 06:38. Reason: Added the last paragraph, it was an afterthought.
              Alfonso Sanchez-Penalver

              Comment


              • #8
                Dear Carlo,
                dear Alfonso,

                you two are a great help for me!

                Just for Information here is the xttab of the writing skills (i've tried to translate the labels):
                Code:
                xttab sprech
                
                                Overall             Between            Within
                   sprech |    Freq.  Percent      Freq.  Percent        Percent
                ----------+-----------------------------------------------------
                 Gar nich |     598      4.41       290     12.29          41.37
                 Eher sch |    2836     20.90       972     41.20          51.19
                  Es geht |    4429     32.64      1417     60.07          52.42
                      Gut |    3962     29.20      1282     54.35          52.95
                 Sehr gut |    1744     12.85       644     27.30          49.66
                ----------+-----------------------------------------------------
                    Total |   13569    100.00      4605    195.21          51.23
                                              (n = 2359)
                The job prestige (a scale from 30 to 210) varies nearly the same (between-%: 199.33; within-%:50.17) but >90% of the overall values are between 30 and 50.


                I've read some chapters about endogeneity now and I am getting behind your ideas now. :-) The last chapter I read suggests that it is still better to use the FE then the RE model because there might be timevariant endogeneity but the timeinvariant is controlled.

                So I produced an FE and an RE model using the interactions and dummy variables you suggested Alfonso. 'corigin' represents the country of origin: italy or turkey, while 'welle' is the year of the panel. I notice that in the FE the R²-value is shrinking while the whole model gets significant. On the other hand the Corr(u_i, Xb) value gets pretty small. At the same time neither sprech (speaking skills) nor schreib (writing skills) is significant. While the last years get significant. In the RE schreib stays significant, one interaction effect and the last years get significant too. All in all the between R² is 'much' higher in RE.

                So could I assume that schreib explains only between variation while the years are explaining a bit of both? And that there is a high risk of endogeneity?

                Code:
                 xtreg magni sprech schreib c.sprech#i.corigin c.schreib#i.corigin i.welle i.corigin, fe
                
                note: 5.corigin omitted because of collinearity (*// it`s 'italy')
                
                Fixed-effects (within) regression               Number of obs     =      7,101
                Group variable: persnr                          Number of groups  =      1,671
                
                R-sq:                                           Obs per group:
                     within  = 0.0086                                         min =          1
                     between = 0.0353                                         avg =        4.2
                     overall = 0.0227                                         max =         12
                
                                                                F(15,5415)        =       3.13
                corr(u_i, Xb)  = 0.0926                         Prob > F          =     0.0000
                
                -----------------------------------------------------------------------------------
                            magni |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                ------------------+----------------------------------------------------------------
                           sprech |   .1966513   .2474662     0.79   0.427     -.288482    .6817845
                          schreib |   .0282442   .2057461     0.14   0.891     -.375101    .4315894
                                  |
                 corigin#c.sprech |
                         Italien  |  -.6138499   .3748033    -1.64   0.102    -1.348615    .1209154
                                  |
                corigin#c.schreib |
                         Italien  |   .5568163   .3265898     1.70   0.088    -.0834309    1.197064
                                  |
                            welle |
                            1985  |   .1046157   .3739103     0.28   0.780    -.6283989    .8376303
                            1986  |   .2851341   .3929701     0.73   0.468    -.4852454    1.055514
                            1987  |   .0498934   .4009457     0.12   0.901    -.7361215    .8359083
                            1989  |  -.3794367   .3988424    -0.95   0.341    -1.161328    .4024549
                            1991  |    .695827   .4044531     1.72   0.085    -.0970637    1.488718
                            1993  |   .4897104   .4234868     1.16   0.248    -.3404941    1.319915
                            1995  |   .8080809    .456722     1.77   0.077    -.0872779     1.70344
                            1997  |   1.396909   .4823092     2.90   0.004     .4513894    2.342429
                            1999  |   1.964634    .528324     3.72   0.000     .9289061    3.000361
                            2001  |   1.895392   .5300997     3.58   0.000     .8561831      2.9346
                            2003  |   1.601238    .553278     2.89   0.004      .516591    2.685886
                                  |
                          corigin |
                         Italien  |          0  (omitted)
                            _cons |   39.31711   .6403161    61.40   0.000     38.06183    40.57238
                ------------------+----------------------------------------------------------------
                          sigma_u |  12.112063
                          sigma_e |  7.0770911
                              rho |  .74548551   (fraction of variance due to u_i)
                -----------------------------------------------------------------------------------
                F test that all u_i=0: F(1670, 5415) = 9.33                  Prob > F = 0.0000
                
                
                xtreg magni sprech schreib c.sprech#i.corigin c.schreib#i.corigin i.welle i.corigin, re
                
                Random-effects GLS regression                   Number of obs     =      7,101
                Group variable: persnr                          Number of groups  =      1,671
                
                R-sq:                                           Obs per group:
                     within  = 0.0057                                         min =          1
                     between = 0.1266                                         avg =        4.2
                     overall = 0.0888                                         max =         12
                
                                                                Wald chi2(16)     =     162.69
                corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                
                -----------------------------------------------------------------------------------
                            magni |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                ------------------+----------------------------------------------------------------
                           sprech |   .5821422   .2335104     2.49   0.013     .1244702    1.039814
                          schreib |   .7290268   .1890895     3.86   0.000     .3584183    1.099635
                                  |
                 corigin#c.sprech |
                         Italien  |  -.6337219   .3570645    -1.77   0.076    -1.333555    .0661116
                                  |
                corigin#c.schreib |
                         Italien  |   .7173936   .2986269     2.40   0.016     .1320956    1.302692
                                  |
                            welle |
                            1985  |    .119952   .3691776     0.32   0.745    -.6036228    .8435267
                            1986  |   .2933379   .3871726     0.76   0.449    -.4655064    1.052182
                            1987  |   .0965589    .393448     0.25   0.806     -.674585    .8677029
                            1989  |  -.4108502   .3879992    -1.06   0.290    -1.171315    .3496141
                            1991  |   .6435894   .3919887     1.64   0.101    -.1246943    1.411873
                            1993  |   .3596215   .4109714     0.88   0.382    -.4458677    1.165111
                            1995  |   .7542474    .443498     1.70   0.089    -.1149926    1.623487
                            1997  |   1.495868   .4685822     3.19   0.001     .5774639    2.414273
                            1999  |   1.862586    .513488     3.63   0.000     .8561681    2.869004
                            2001  |   1.919271   .5029211     3.82   0.000     .9335637    2.904978
                            2003  |   1.710003   .5267927     3.25   0.001     .6775078    2.742497
                                  |
                          corigin |
                         Italien  |   2.063868   1.205596     1.71   0.087    -.2990569    4.426793
                            _cons |   35.68279   .7638936    46.71   0.000     34.18559       37.18
                ------------------+----------------------------------------------------------------
                          sigma_u |  10.326191
                          sigma_e |  7.0770911
                              rho |  .68040662   (fraction of variance due to u_i)
                -----------------------------------------------------------------------------------
                I will check out how I can run a cross-sectional estimation too.


                Kind regrads,
                Marcel
                Last edited by Marcel Gehrke; 26 Sep 2016, 15:02.

                Comment


                • #9
                  The introduction of the nationality dummy has allowed us to detect that the negative coefficient is on Italians. For Turkish folk increasing speaking abilities increases their recognition, but for Italian migrants it reduces it (are Italians getting into trouble once they learn the language?). The less recognition for Italians seems to be consistent in both estimations, it's the effect on Turkish that decreases in the fixed effects estimation (and as a consequence the effects of better speaking on Italians as well since for Italians it's the sum of both coefficients). The joint significance of the fixed effects estimation seems to be caused by the inclusion of the time specific effects variables, but not because of any of the variables of interest. I'm not sure of why the reduction in the R-squared. What proportion of the sample are Italians?

                  From the xttab command it seems as if sprech is really a categorical variable. I mean, you have five categories. If you include it as a continuous variable you're assuming that moving from one category to the next has always the same effect, independent of what category this is. This seems unreasonable. One would imagine that there ought to be diminishing returns to how well you speak the language. Is that the same for schreib? You may want to enter sprech as a categorical variable, and if schreib has the same structure, then include that one as a categorical variable as well. Finally, I wonder if there is interaction between both schreib and sprech. However, if you add them both as categorical variables the interaction terms may be confusing to interpret.
                  Last edited by Alfonso Sánchez-Peñalver; 26 Sep 2016, 18:22.
                  Alfonso Sanchez-Penalver

                  Comment

                  Working...
                  X