Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Clyde, I also need to help using OLS or fixed effects, or random effects.

    reg x y growth ln_SIZE log_AGE i.country i.industry i.date, robust

    OLS is true way or not?

    Thanks

    Regards

    Comment


    • #17
      Welcome to Statalist. It's not a good idea to address your question to a particular person, unless you are making a specific response to their previous post. In your case you are asking a more general question. By addressing it only to me, you reduce the chances that somebody else who might give you a quicker or better response will even read past your first two words.

      To answer your question, you need to explain the context of this regression command. If you did this as a fixed-effects regression (-xtreg, fe-) what would the panel fixed effect be: industry or country? Or is there some other variable. Are industries and countries crossed, or is one nested in the other? Do you actually have panel data? That is, are the same entities (whether they are countries or industries or something else) observed on several dates, or do you actually have serial cross-sections (different entities are observed with each new date)? And, most important, what is your research question?

      Comment


      • #18
        Thank you very much for your reply and suggestions.
        I have 7000 observations in 20 different countries, 20 different industries, 1000 companies, and 7 years.
        I want to find the relationship between independent and dependent variable, taking into account the country and industry impact.
        Also I would like to look at the differences between countries if possible.

        thank you again

        Comment


        • #19
          Well, that provides some additional information. But you still did not answer the questions I posed in #17.

          Please re-read the second paragraph of #17 and respond to each question asked there.

          Comment


          • #20
            I'm sorry for my poor english.
            if I did this as a fixed effects regression, fixed effect be: country
            There are 20 industries in the US. some industries do not exist in other countries.
            I have seven years of observation for each company. (panel variable firm: strongly balanced)

            Does the independent variable affect the dependent variable?
            iif I can do;
            Country effect: yes
            year effect: yes
            industry effect: yes

            data sample :
            firm country industry date x y ln_Size ln_AGE ROA
            1 26 8500 2011 0.029864 81.3 22.28356 3.637586 11.81
            1 26 8500 2012 0.013884 74.1 23.62044 3.663562 4.45
            1 26 8500 2013 0.016574 71.9 23.58668 3.688879 4.99
            1 26 8500 2014 0.018972 68.9 23.69788 3.713572 6.17
            1 26 8500 2015 0.002926 69.3 23.68829 3.73767 7.07
            1 26 8500 2016 0.011365 67.2 23.64553 3.7612 6.15
            1 26 8500 2017 0.008134 73.8 23.62677 3.78419 7.69
            2 2 5200 2011 0.026551 72.7 23.71198 3.806662 8.86
            2 2 5200 2012 0.008747 82.9 24.87717 3.044522 11.65
            2 2 5200 2013 0.018439 81.0 25.04009 3.091042 9.05
            2 2 5200 2014 0.021874 82.2 25.05661 3.135494 -1.38
            2 2 5200 2015 0.040831 77.6 24.87899 3.178054 -1.22
            2 2 5200 2016 0.001249 83.1 24.95058 3.218876 -3.17
            2 2 5200 2017 0.004363 69.2 24.66864 3.258097 -9.1
            3 3 5200 2011 0.005592 70.9 24.78073 3.295837 3.67
            3 3 5200 2012 0.03257 65.4 24.61161 3.332205 6.91
            3 3 5200 2013 0.008424 63.8 22.34918 3.044522 -0.29
            3 3 5200 2014 0.003741 59.2 22.43436 3.091042 1.99
            3 3 5200 2015 0.007963 67.1 22.44738 3.135494 6.84
            3 3 5200 2016 0.00014 65.7 22.23694 3.178054 -4.98
            3 3 5200 2017 0.009623 65.4 22.04001 3.218876 -2.4

            Comment


            • #21
              OK, thank you. This is much clearer to me now.

              So here's the problem. It appears to me that what you really have is three-level data. You have yearly observations nested within firms, and firms are nested within industries. Country is in a multiple-membership relationship with firm (really a crossed relationship but with some uninstantiated combinations.) This raises several approaches, all of them with some drawbacks.

              Let's start from the assumption that you would prefer to do a fixed-effects model, just because that is the dominant and most accepted approach in economics and finance. The difficulty is that there is no way to simultaneously get estimates of firm and industry in a fixed-effects model, because industry will be constant within firms over time, and hence is not estimable. So you could do a model with -xtset firm year- and -xtreg x y growth ln_SIZE log_AGE i.country i.date, fe vce(cluster industry)-. Two things to note here: the clustering in the vce() option is at the industry level, not the firm level, because observations are correlated within industry, even though you are not explicitly including an industry variable in the model. And, industry is not among the predictor variables (and if you try to add it, Stata will drop it due to colinearity with the firm fixed effect). If you don't care about the industry level effects, then this is not a problem and you can proceed this way. But since your original regression equation included industry, I take it you may well be interested in industry level effects. In that case, you will have to -xtset industry year- and -xtreg x y growth ln_SIZE log_AGE i.country i.date, fe vce(cluster industry)-. This is a fixed-effects version of the OLS equation you wrote in #1 (more or less), and it ignores the firm level altogether. The firm-level variance will be absorbed into the industry-level variance. If you are not interested in the firm-level effects, then this is OK. One caution: you have only 20 industries. The use of -vce(cluster industry)- is questionable with only 20 clusters. Everyone agrees that cluster robust standard errors are asymptotically correct, and they perform poorly with small numbers of clusters. Unfortunately, there is no consensus about how many clusters are needed. And 20 falls in a gray area: some statisticians will tell you 20 is enough, and others will say it is not. If you use -vce(cluster industry)- don't be surprised if somebody criticizes you for it. Expect also to be criticized by somebody else if you don't use it.

              Now, once you run the fixed effects model, you can take a look at the bottom part of the results table that Stata gives you. If sigma_u and rho are very close to zero, that tells you that the industry or firm level effects (depending on which -xtset- you used), are negligible and you can get equally consistent (but more efficient) results with just OLS regression. (But retain the -vce(cluster industry)- option in the OLS regression as well.) If sigma_u and rho are appreciably far from zero, then the OLS model would not be useful here.

              If you are not settled on a fixed-effects or OLS regression, and are willing to consider random-effects regression, then you can do a three-level model using -mixed-. This is, from a modeling perspective, the best model of your data because it will actually reflect all three levels in your data. The drawback is that as a random-effects model, it relies on the assumption that the error terms are independent of the linear predictor, an assumption that may be false, and if false gives you biased coefficient estimates. Another way to think about that is that whereas a fixed-effects model gives you a pure estimate of the within-panel effects of the predictors, the random-effects model assumes that the within- and between-panel effects are the same and gives an estimate of this common effect. But if the within- and between-effects are not, in reality, the same, then the resulting estimate is of a meaningless parameter. Anyway, if you want to go this route, the code would be:
              Code:
              mixed x y growth ln_SIZE log_AGE i.country  i.date || industry: || firm:, vce(cluster industry)
              Note that all of the above takes for granted that a linear model of x in terms of y, growth, ln_SIZE, and log_AGE is a reasonable model of the data generating process. Whether that is true is a question in economics, not statistics, and one that you will need to rely on the literature and your disciplinary colleagues to assure yourself of (or to find a more realistic model).

              Comment


              • #22
                I am really very grateful for your detailed description.
                When I look at the literature, they write bottom part of the results table the country, year and industry (or year and industry) fixed effect.
                for example:
                "where CSR is a firm's CSR performance, CROSS-LISTING is a dummy variable equal to one if a firm is cross-listed in the U.S. in a
                given year, CONTROLS is a vector that contains the firm-level control variables (SIZE, AGE, SGR, ROA, LEV, RDS, SH_RIGHTS, SD_ROA,
                and SD_RET), and Fixed Effects is a vector that includes country, industry, and year fixed effects. In each regression, we follow
                Petersen (2009:Estimating Standard Errors in Finance Panel
                Data Sets: Comparing Approaches) and cluster standard errors by firm and year."

                Actually, I want to use the most accurate method.

                1. When I do a model with -xtset firm year- and -xtreg x y growth ln_SIZE log_AGE i.country i.date, fe vce(cluster industry)- Stata drop country due to colinearity.
                In this code: -xtreg x y growth ln_SIZE log_AGE i.country i.date, fe vce(robust) also drop country.
                2. when I do a model with mixed x y growth ln_SIZE log_AGE i.country i.date || industry: || firm:, vce(cluster industry) there is no p value. -

                3. when I do a model with xtreg x y growth ln_SIZE log_AGE i.country i.date, re vce(robust) or xtreg x y growth ln_SIZE log_AGE i.country i.date i. industry, re vce(robust) results are significant.

                Comment


                • #23
                  1. When I do a model with -xtset firm year- and -xtreg x y growth ln_SIZE log_AGE i.country i.date, fe vce(cluster industry)- Stata drop country due to colinearity.
                  In this code: -xtreg x y growth ln_SIZE log_AGE i.country i.date, fe vce(robust) also drop country.
                  That surprises me. This implies that each firm operates only one country. If that is true, then you cannot do this model with country.

                  2. when I do a model with mixed x y growth ln_SIZE log_AGE i.country i.date || industry: || firm:, vce(cluster industry) there is no p value. -
                  There are several possible reasons for this. You would need to show the complete output of this command to get more specific advice and explanation. Be sure to show the exact -mixed- command (with all the actual variables you used) and the exact output you got from Stata. Do this by copying directly form Stata's Results window or your log file and pasting here in the Forum editor between code delimiters. If you are not familiar with code delimiters, see Forum FAQ #12 or see David Benson's video on code delimiters and -dataex- https://youtu.be/bXfaRCAOPbI.

                  3. when I do a model with xtreg x y growth ln_SIZE log_AGE i.country i.date, re vce(robust) or xtreg x y growth ln_SIZE log_AGE i.country i.date i. industry, re vce(robust) results are significant.
                  The American Statistical Association has recommended that the concept of statistical significance be abandoned. See https://www.tandfonline.com/doi/full...5.2019.1583913 for the "executive summary" and https://www.tandfonline.com/toc/utas20/73/sup1 for all 43 supporting articles. Or https://www.nature.com/articles/d41586-019-00857-9 for the tl;dr. Focus on the coefficient estimates and their precision (as given by the standard errors and confidence intervals.) Consider whether the coefficient estimates represent an effect that is large enough to be of real world practical importance. And then consider whether the confidence interval is narrow enough that your conclusions and actions would be the same if the actual result were anywhere in that interval. If so, you have a solid finding in support of particular conclusions. If not, then it would seem that the data do not sufficiently identify the parmeters of your model enough to support concrete actions and firm conclusions, so that further research would be needed.

                  Comment


                  • #24
                    Thank you for the information and resources.

                    Code:
                    . mixed X Y Controlvar1 Controlvar2 Controlvar3 Controlvar4 Controlvar5 Controlvar6 i.countrycode  i.date || industry: || firm:, vce(cluster indus
                    > try)
                     
                    Performing EM optimization:
                     
                    Performing gradient-based optimization:
                     
                    Iteration 0:   log pseudolikelihood =  23195.234 
                    Iteration 1:   log pseudolikelihood =  23195.234 
                     
                    Computing standard errors:
                     
                    Mixed-effects regression                        Number of obs     =     12,600
                     
                    -------------------------------------------------------------
                                    |     No. of       Observations per Group
                     Group Variable |     Groups    Minimum    Average    Maximum
                    ----------------+--------------------------------------------
                           industry |         24         49      525.0      1,428
                               firm |      1,800          7        7.0          7
                    -------------------------------------------------------------
                     
                                                                    Wald chi2(23)     =          .
                    Log pseudolikelihood =  23195.234               Prob > chi2       =          .
                     
                                                  (Std. Err. adjusted for 24 clusters in industry)
                    ------------------------------------------------------------------------------
                                 |               Robust
                               X |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                               Y |  -.0056647   .0031939    -1.77   0.076    -.0119247    .0005952
                     Controlvar1 |  -.0008255   .0003629    -2.27   0.023    -.0015367   -.0001142
                     Controlvar2 |   9.81e-09   5.13e-10    19.12   0.000     8.80e-09    1.08e-08
                     Controlvar3 |  -.0026413   .0009691    -2.73   0.006    -.0045408   -.0007419
                     Controlvar4 |   -.015467   .0034817    -4.44   0.000    -.0222911    -.008643
                     Controlvar5 |   .0046268   .0032363     1.43   0.153    -.0017163    .0109698
                     Controlvar6 |   .0001672   .0000619     2.70   0.007     .0000459    .0002885
                                 |
                     countrycode |
                              2  |   .0369123    .002768    13.34   0.000     .0314871    .0423374
                              3  |   .0204936   .0049435     4.15   0.000     .0108045    .0301826
                              4  |    .034127   .0028425    12.01   0.000     .0285558    .0396983
                              5  |   .0106433   .0043106     2.47   0.014     .0021947    .0190918
                              6  |   .0220549   .0076659     2.88   0.004       .00703    .0370799
                              7  |   .0078828   .0038883     2.03   0.043      .000262    .0155037
                              8  |   .0591841   .0077036     7.68   0.000     .0440852    .0742829
                              9  |    .018109   .0046787     3.87   0.000     .0089388    .0272791
                             10  |    .021436   .0025232     8.50   0.000     .0164906    .0263813
                             11  |   .0075008   .0045971     1.63   0.103    -.0015094     .016511
                             12  |   .0164926    .003404     4.85   0.000      .009821    .0231643
                             13  |   .0335327   .0044789     7.49   0.000     .0247543    .0423112
                             14  |   .0309492   .0037336     8.29   0.000     .0236314     .038267
                             15  |   .0287607   .0057578     5.00   0.000     .0174755    .0400459
                             16  |   .0312306   .0048842     6.39   0.000     .0216576    .0408035
                             17  |   .0299022   .0042716     7.00   0.000     .0215301    .0382742
                             18  |   .0411138   .0084551     4.86   0.000     .0245421    .0576855
                             19  |   .0267593   .0035851     7.46   0.000     .0197326     .033786
                             20  |   .0251587   .0029755     8.46   0.000     .0193268    .0309906
                             21  |   .0126215   .0068135     1.85   0.064    -.0007328    .0259758
                             22  |   .0041654   .0029269     1.42   0.155    -.0015713    .0099021
                             23  |   .0281488   .0028884     9.75   0.000     .0224877    .0338099
                             24  |   .0200026    .006691     2.99   0.003     .0068884    .0331167
                             25  |  -.0031975   .0019577    -1.63   0.102    -.0070344    .0006395
                             26  |   .0362498   .0023008    15.76   0.000     .0317403    .0407593
                             27  |   .0189296   .0036175     5.23   0.000     .0118395    .0260197
                             28  |   .0244681   .0040969     5.97   0.000     .0164383    .0324979
                             29  |   .0188599   .0039432     4.78   0.000     .0111314    .0265883
                             30  |   .0115971   .0027194     4.26   0.000     .0062671    .0169271
                             31  |   .0053109   .0036603     1.45   0.147    -.0018631    .0124849
                             32  |    .022489   .0052537     4.28   0.000      .012192     .032786
                             33  |   .0171532   .0084234     2.04   0.042     .0006436    .0336629
                             34  |  -.0089162   .0064205    -1.39   0.165    -.0215002    .0036678
                             35  |   .0068589   .0035882     1.91   0.056    -.0001739    .0138917
                                 |
                            date |
                           2012  |  -.0033052   .0011439    -2.89   0.004    -.0055472   -.0010632
                           2013  |  -.0032209   .0012454    -2.59   0.010    -.0056618     -.00078
                           2014  |  -.0032014   .0011061    -2.89   0.004    -.0053694   -.0010335
                           2015  |  -.0032741    .001352    -2.42   0.015    -.0059239   -.0006242
                           2016  |  -.0026987   .0013423    -2.01   0.044    -.0053296   -.0000678
                           2017  |   -.001382   .0017634    -0.78   0.433    -.0048383    .0020742
                                 |
                           _cons |   .0855992    .016691     5.13   0.000     .0528855    .1183129
                    ------------------------------------------------------------------------------
                     
                    ------------------------------------------------------------------------------
                                                 |               Robust          
                      Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
                    -----------------------------+------------------------------------------------
                    industry: Identity           |
                                      var(_cons) |   .0000397   .0000127      .0000212    .0000742
                    -----------------------------+------------------------------------------------
                    firm: Identity               |
                                      var(_cons) |   .0002517   .0000251       .000207    .0003061
                    -----------------------------+------------------------------------------------
                                   var(Residual) |   .0012998   .0002164      .0009379    .0018014
                    ------------------------------------------------------------------------------
                    Y coef is negatif as I predicted. Control variables were determined according to the literature. However, in the literature some studies use fewer control variables. I think they are adding or subtracting according to the results.


                    Comment


                    • #25
                      additional information for #24: there are a minimum of 8 companies in a country(large part of sample US firms in data). Therefore in some countries in the industry classification there is only one representative in an industry and also some industries do not exist.

                      Comment


                      • #26
                        The reason you get no overall chi2 and pvalue for the model as a whole is that you have too many predictor variables for your degrees of freedom. Despite the large sample size, the number of degrees of freedom with the vce(cluster industry) estimator is the number of cluster minus 1. In your case that's 23. But you have 46 predictor variables in your model. So an overall test of the model is not possible.

                        This is usually not a problem. Why do you care about the overall model chi2 and pvalue? Most of those variables are just nuisance variables anyway--included to adjust for their effects but not of real interest. In fact, probably the only variable that really matters here is Y, and you have all the statistic about Y's effect you could want. So don't worry about the absence of a full-model test: it's impossible to get, and it's unnecessary anyway.

                        Comment


                        • #27
                          First of all, I would like to thank you for your interest and informative comments.

                          if the overall model is invalid; Is it possible to explain the effect of Y? or how can I defend it?
                          I'm a little upset now.
                          Can I use the following models? or what do you suggest?
                          As seen coef. values ​​very close.

                          1.
                          Code:
                          mixed X Y Controlvar1 Controlvar2 Controlvar3 Controlvar4 i.countrycode  i.date i.industry, vce(robust)
                           
                          Mixed-effects regression                        Number of obs     =     12,600
                           
                                                                          Wald chi2(68)     =    3309.65
                          Log pseudolikelihood =   22858.08               Prob > chi2       =     0.0000
                           
                          ------------------------------------------------------------------------------
                                       |               Robust
                                     X |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                     Y |  -.0138277   .0021375    -6.47   0.000    -.0180171   -.0096382
                            Controlvar1|  -.0008839   .0001667    -5.30   0.000    -.0012107   -.0005572
                            Controlvar2|   .0160871    .003108     5.18   0.000     .0099956    .0221786
                            Controlvar3|  -.0003282   .0004896    -0.67   0.503    -.0012878    .0006314
                            Controlvar4|   .0044434   .0027162     1.64   0.102    -.0008803    .0097671
                           
                           shortened from here because there is no need.
                           
                           
                          ------------------------------------------------------------------------------
                                                       |               Robust          
                            Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
                          -----------------------------+------------------------------------------------
                          
                                         var(Residual) |   .0015552   .0001037      .0013647    .0017724
                          ------------------------------------------------------------------------------

                          2.
                          Code:
                          reg X Y Controlvar1 Controlvar2 Controlvar3 Controlvar4 i.countrycode i.industry i.date, robust
                           
                          Linear regression                               Number of obs     =     12,600
                                                                          F(68, 12531)      =      48.41
                                                                          Prob > F          =     0.0000
                                                                          R-squared         =     0.1656
                                                                          Root MSE          =     .03954
                           
                          ------------------------------------------------------------------------------
                                       |               Robust
                                     X |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                     Y |  -.0138277   .0021433    -6.45   0.000    -.0180289   -.0096264
                           Controlvar1 |  -.0008839   .0001672    -5.29   0.000    -.0012116   -.0005563
                           Controlvar2 |   .0160871   .0031164     5.16   0.000     .0099785    .0221957
                           Controlvar3 |  -.0003282   .0004909    -0.67   0.504    -.0012905    .0006341
                           Controlvar4 |   .0044434   .0027236     1.63   0.103    -.0008953     .009782
                                     shortened because there is no need.
                          3.
                          Code:
                          xtreg X Y Controlvar1 Controlvar2 Controlvar3 Controlvar4 i.countrycode i.industry i.date, robust
                           
                          Random-effects GLS regression                   Number of obs     =     12,600
                          Group variable: firm                            Number of groups  =      1,800
                           
                          R-sq:                                           Obs per group:
                               within  = 0.0428                                         min =          7
                               between = 0.3673                                         avg =        7.0
                               overall = 0.1653                                         max =          7
                           
                                                                          Wald chi2(68)     =    1242.44
                          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                           
                                                         (Std. Err. adjusted for 1,800 clusters in firm)
                          ------------------------------------------------------------------------------
                                       |               Robust
                                     X |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                     Y |   -.012777   .0030584    -4.18   0.000    -.0187713   -.0067826
                           Controlvar1 |  -.0009549   .0002073    -4.61   0.000    -.0013612   -.0005486
                           Controlvar2 |   .0151431   .0030715     4.93   0.000     .0091231     .021163
                           Controlvar3 |  -.0003176   .0004834    -0.66   0.511     -.001265    .0006298
                           Controlvar4 |   .0057143     .00383     1.49   0.136    -.0017923    .0132209

                          Comment


                          • #28
                            I never said the overall model is invalid, and I don't see any problem with it. It's a perfectly legitimate model. It just has more degrees of freedom than there are residual degrees of freedom when you run it with a cluster robust vce. So you don't get an overall chi square statistic. But you don't need an overall chi square statistic for anything here anyway. It's a pointless statistic.

                            Additional points regarding your 3 alternate models. In 1, it makes no sense to use -mixed- and then not specify the higher levels. Did you notice that #1 and #2 give almost identical results except for tiny numerical rounding errors? That's be cause -mixed- without higher levels is just doing -regress- by a more convoluted and slightly less accurate calculation. Moreover, the results for the variance components in post #24 show that the industry and firm levels are not small enough to disregard. You really should be keeping those in your model. Your alternate model 3 is somewhat problematic because it completely ignores industry: not only is there no vce(cluster industry) but there is no random effect at that level either--the model isn't really well specified.

                            Now, this all started because you were unhappy about the results you were getting with -mixed- and -vce(cluster industry)-. As I have noted earlier, the use of -vce(cluster ...)- is controversial when the number of clusters is small. While 24 clusters would be acceptable to many people, if you are really bothered by the aesthetics of not having an overall model chi-square statistic, re-run the model in post #24, omitting -vce(cluster industry)- and, if you like, use -vce(robust)-. You will get your overall chi square (though I still think it's useless), and you will have a properly specified three-level model. If anyone challenges you on not using cluster robust standard errors, you can defend by saying the number of clusters is too small. Frankly, though, if it were me, I'd stick with what you got in #24.

                            Comment


                            • #29
                              Now I get it clearer. I thought I couldn't comment on Y if the value of overall model's chi square was not significant. After your statements, I realized this was a wrong idea. I will use #24 as you suggest.
                              Thanks again for helping me so much.

                              Comment


                              • #30
                                Hi again Dear Clyde, I have questions about the code in #24 that you mentioned earlier.
                                You will get your overall chi square (though I still think it's useless), and you will have a properly specified three-level model. If anyone challenges you on not using cluster robust standard errors, you can defend by saying the number of clusters is too small. Frankly, though, if it were me, I'd stick with what you got in #24.
                                Is it three-level model or a two-level model?

                                In #24, the code analysis at the firm and industry level by taking into account the country and industry impact. Is it right?

                                Using industry and firm levels and cluster industry because a company operates in only one country. Is it right?

                                Thank you for your invaluable contributions.

                                Comment

                                Working...
                                X