Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Do I have to log-transform the lagged dependent variable in a Poisson FE model?

    Dear statalist,

    Thank you for reading this post. I am using the Poisson Pseudo Maximum Likelihood (PPMLHDFE) package to test my hypothesis. My dataset is panel data with t = 40 and i = 130.

    The dependent variable in my research is military aid from the US, which contains many zeros (60% is zero). Therefore, I follow the previous suggestions on this forum and choose the Poisson model. To solve the potential autocorrelation problem, my advisor recommends controlling for the lagged DV in my model. However, I am not sure if I should log-transform the dependent variable. The results are quite different if I log-transform the lagged DV (see Figure 1: log-transformed lagged DV; Figure 2: lagged DV). I looked it up on the Internet, and it seems to me that I should log-transform the lagged DV. Could you please advise me on this question? Any feedback is welcome and appreciated!

    Click image for larger version

Name:	logged.png
Views:	1
Size:	36.0 KB
ID:	1754334

    Figure 1: log-transform lagged DV

    Click image for larger version

Name:	not logged.png
Views:	1
Size:	36.1 KB
ID:	1754335

    Figure 2: lagged DV

    Best,
    Po

  • #2
    I apologize for the small figures I uploaded. Below is the updated figure with higher resolution.

    Click image for larger version

Name:	Screenshot 2024-05-23 at 5.55.34 PM.png
Views:	1
Size:	123.7 KB
ID:	1754339

    Figure 1: log-transformed lagged DV

    Click image for larger version

Name:	Screenshot 2024-05-23 at 5.52.23 PM.png
Views:	1
Size:	123.9 KB
ID:	1754340

    Figure 2: lagged DV

    Comment


    • #3
      Use code delimiters to post your Stata results. It hurts eyes, at least in my laptop, what you have posted. Please read the forum FAQ section for posting suggestions.
      Roman

      Comment


      • #4
        Dear Poyung Lin,

        You say that your dependent variable has lots of zeros, so if you log its lag to include in the model, those observations will be lost, so that should be a problem. However, to my surprise, the two regressions you showed us have the same number of observations. Can you explain why that is the case?

        Anyway, as long as clustered standard errors are used, I would not worry about serial correlation but if you really need to include the lagged dependent variable, it is probably sensible to transform it in some way (remember that the regressors are inside the exponential function).

        Best wishes,

        Joao

        Comment


        • #5
          Dear Joao Santos ,

          Thank you so much for the reply. I am sorry that I did not clarify that in my original post. The way I log the lagged DV is that I plus 1 before I log it. I am not sure if this is acceptable to do that or cause more problems?

          Roman Mostazir Thank you for reminding me this info.

          Comment


          • #6
            The log(x+1) or log(y+1) hack is generally thought to be a bad idea; see the literature. Given that the choice of 1 here is arbitrary, using some other constant (0.5, 2, 0.1, ... ) would make just as much sense. It's commonly true that different choices of constants will lead to different parameter estimates, sometimes quite different. There are other objections to this approach, too.

            Comment


            • #7
              Dear Poyung Lin,

              Like Mike, I do not like that transformation but it may be reasonable to use it for an explanatory variable as whether it is suitable is an empirical matter (essentially, it has to do with the functional form you want to specify). Note, however, that I would never use that transformation for the dependent variable. An alternative transformation of the explanatory variable that can be used is to add 1 just to the zero values and also include in the model a dummy identifying those observations. This allows you to separate the effect of going from 0 to positive, from the effects of changes in the positive values.

              Best wishes,

              Joao

              Comment


              • #8
                Dear Mike Lacy and Joao Santos Silva ,

                Thank you so much for your suggestions and help! I am really grateful for your insights!!


                Best,
                Po

                Comment


                • #9
                  Originally posted by Poyung Lin View Post
                  Dear statalist,

                  Thank you for reading this post. I am using the Poisson Pseudo Maximum Likelihood (PPMLHDFE) package to test my hypothesis. My dataset is panel data with t = 40 and i = 130.

                  The dependent variable in my research is military aid from the US, which contains many zeros (60% is zero). Therefore, I follow the previous suggestions on this forum and choose the Poisson model. To solve the potential autocorrelation problem, my advisor recommends controlling for the lagged DV in my model. However, I am not sure if I should log-transform the dependent variable. The results are quite different if I log-transform the lagged DV (see Figure 1: log-transformed lagged DV; Figure 2: lagged DV). I looked it up on the Internet, and it seems to me that I should log-transform the lagged DV. Could you please advise me on this question? Any feedback is welcome and appreciated!

                  [ATTACH=CONFIG]n1754334[/ATTACH]
                  Figure 1: log-transform lagged DV

                  [ATTACH=CONFIG]n1754335[/ATTACH]
                  Figure 2: lagged DV

                  Best,
                  Po
                  I encountered the same problem. How did you end up solving it?

                  Comment


                  • #10
                    See #7.

                    Comment


                    • #11
                      thanks for your reply

                      Comment


                      • #12
                        Originally posted by Joao Santos Silva View Post
                        See #7.
                        thanks for your reply

                        Comment


                        • #13
                          Originally posted by Joao Santos Silva View Post
                          See #7.
                          "I have a question I'd like to consult with you on. I'm interested in studying the impact of democratization on conflict. My conflict data has a significant right-skewed distribution (with some countries having a very large conflict variable, reaching into the hundreds or even thousands, while many other countries have relatively low conflict data, with values in the single digits, and even some zero values). About 30% of my country-year samples have a conflict variable of zero. Initially, I used a dynamic panel data model, \( \ln(\text{conflict}_{it}) = \alpha \ln(\text{conflict}_{i,t-1}) + \beta \text{dem}_{it} + \gamma X_{it} + \mu_i + \lambda_t + \epsilon_{it} \), employing the command `reghdfe lnconflict l.dem l(1/8).lnconflict $covar if dem1945==0, a(ccode year) vce(cluster ccode)` to obtain the average treatment effect of democratization (l.dem), where the dependent variable \( \ln(\text{conflict}) = \ln(1 + \text{conflict}) \). Under this linear regression assumption, I can estimate the long-term effect of democratization as \( \text{longrun effect} = \frac{\beta}{1-\alpha} \). However, if I were to use a Poisson fixed effects model, how should I proceed?"
                          "If I use the untransformed conflict variable (without adding 1 and then taking the logarithm) as a lagged term in the Poisson fixed effects model, the long-term effect of democracy is almost the same as the short-term effect, since the coefficients for `l(1/8).conflict` are very close to 0, and the persistent effect of conflict is also nearly 0. The Stata command is as follows: `ppmlhdfe conflict l.e_boix_regime l(1/8).conflict $covar if dem1945==0, absorb(ccode year) vce(cluster ccode)`.

                          However, if I use the lagged term of the conflict variable that has been transformed by adding 1 and then taking the logarithm, the results become very similar to those obtained with the command `reghdfe lnconflict l.dem l(1/8).lnconflict $covar if dem1945==0, a(ccode year) vce(cluster ccode)`. But I am uncertain whether this approach is appropriate."

                          Comment


                          • #14
                            ************************************************** ***********************TWFE- LN(1+y)******************************************* ****
                            . reghdfe lnconflict l.e_boix_regime l(1/8).lnconflict $covar if dem1945==0, a(ccode year) vce(cluster c
                            > code)
                            HDFE Linear regression Number of obs = 6,534
                            Absorbing 2 HDFE groups F( 22, 139) = 71.26
                            Statistics robust to heteroskedasticity Prob > F = 0.0000
                            R-squared = 0.7729
                            Adj R-squared = 0.7649
                            Within R-sq. = 0.3006
                            Number of clusters (ccode) = 140 Root MSE = 0.6601

                            (Std. err. adjusted for 140 clusters in ccode)
                            -----------------------------------------------------------------------------------
                            | Robust
                            lnconflict | Coefficient std. err. t P>|t| [95% conf. interval]
                            ------------------+----------------------------------------------------------------
                            e_boix_regime |
                            L1. | -.1014854 .028453 -3.57 0.000 -.157742 -.0452288
                            |
                            lnconflict |
                            L1. | .3645236 .0190307 19.15 0.000 .3268964 .4021507
                            L2. | .1302671 .0142877 9.12 0.000 .1020177 .1585165
                            L3. | .0709302 .0132346 5.36 0.000 .0447631 .0970973
                            L4. | .0461393 .0157231 2.93 0.004 .0150519 .0772267
                            L5. | .0134258 .0145119 0.93 0.356 -.0152668 .0421185
                            L6. | .009182 .0160552 0.57 0.568 -.022562 .040926
                            L7. | -.0046732 .0148283 -0.32 0.753 -.0339914 .0246449
                            L8. | -.0005541 .0163819 -0.03 0.973 -.032944 .0318359

                            ------------------------------------------------------------------------------
                            lnconflict | Coefficient Std. err. z P>|z| [95% conf. interval]
                            -------------+----------------------------------------------------------------
                            longrun | -.2737231 .0847251 -3.23 0.001 -.4397812 -.1076649
                            shortrun | -.1014854 .028453 -3.57 0.000 -.1572522 -.0457186
                            persistence | .6292406 .0301159 20.89 0.000 .5702146 .6882667
                            lag1 | .3645236 .0190307 19.15 0.000 .327224 .4018231
                            lag2 | .1302671 .0142877 9.12 0.000 .1022637 .1582706
                            lag3 | .0709302 .0132346 5.36 0.000 .0449909 .0968694
                            lag4 | .0461393 .0157231 2.93 0.003 .0153225 .0769561
                            lag5 | .0134258 .0145119 0.93 0.355 -.015017 .0418687
                            lag6 | .009182 .0160552 0.57 0.567 -.0222856 .0406496
                            lag7 | -.0046732 .0148283 -0.32 0.753 -.0337361 .0243897
                            lag8 | -.0005541 .0163819 -0.03 0.973 -.032662 .0315539
                            ------------------------------------------------------------------------------


                            ************************************************** ***********************Poisson- control with L(1/8).ln(1+y)**************************************** ********
                            . ppmlhdfe conflict l.e_boix_regime l(1/8).lnconflict $covar if dem1945==0, absorb(ccode year) vce(cluster ccode)
                            HDFE PPML regression No. of obs = 6,532
                            Absorbing 2 HDFE groups Residual df = 138
                            Statistics robust to heteroskedasticity Wald chi2(22) = 4128.08
                            Deviance = 32282.87939 Prob > chi2 = 0.0000
                            Log pseudolikelihood = -24850.22641 Pseudo R2 = 0.7799

                            Number of clusters (ccode) = 139
                            (Std. err. adjusted for 139 clusters in ccode)
                            -----------------------------------------------------------------------------------
                            | Robust
                            conflict | Coefficient std. err. z P>|z| [95% conf. interval]
                            ------------------+----------------------------------------------------------------
                            e_boix_regime |
                            L1. | -.1424864 .04702 -3.03 0.002 -.2346439 -.0503289
                            |
                            lnconflict |
                            L1. | .5981892 .0459273 13.02 0.000 .5081733 .688205
                            L2. | .0429773 .031448 1.37 0.172 -.0186596 .1046142
                            L3. | .036998 .0182096 2.03 0.042 .0013079 .0726881
                            L4. | .042681 .0254307 1.68 0.093 -.0071623 .0925242
                            L5. | -.0168485 .022236 -0.76 0.449 -.0604301 .0267332
                            L6. | .0148835 .0343038 0.43 0.664 -.0523508 .0821177
                            L7. | -.0402701 .0196408 -2.05 0.040 -.0787655 -.0017748
                            L8. | .026953 .0241189 1.12 0.264 -.0203192 .0742253

                            ------------------------------------------------------------------------------
                            conflict | Coefficient Std. err. z P>|z| [95% conf. interval]
                            -------------+----------------------------------------------------------------
                            longrun | -.4839288 .1615405 -3.00 0.003 -.8005423 -.1673153
                            shortrun | -.1424864 .04702 -3.03 0.002 -.2346439 -.0503289
                            persistence | .7055634 .0362887 19.44 0.000 .6344389 .7766879
                            lag1 | .5981892 .0459273 13.02 0.000 .5081733 .688205
                            lag2 | .0429773 .031448 1.37 0.172 -.0186596 .1046142
                            lag3 | .036998 .0182096 2.03 0.042 .0013079 .0726881
                            lag4 | .042681 .0254307 1.68 0.093 -.0071623 .0925242
                            lag5 | -.0168485 .022236 -0.76 0.449 -.0604301 .0267332
                            lag6 | .0148835 .0343038 0.43 0.664 -.0523508 .0821177
                            lag7 | -.0402701 .0196408 -2.05 0.040 -.0787655 -.0017748
                            lag8 | .026953 .0241189 1.12 0.264 -.0203192 .0742253
                            ------------------------------------------------------------------------------



                            ************************************************** ***********************Poisson- control with L(1/8).y********************************************** *
                            . ppmlhdfe conflict l.e_boix_regime l(1/8).conflict $covar if dem1945==0, absorb(ccode year) vce(cluster ccode)
                            -----------------------------------------------------------------------------------
                            | Robust
                            conflict | Coefficient std. err. z P>|z| [95% conf. interval]
                            ------------------+----------------------------------------------------------------
                            e_boix_regime |
                            L1. | -.1907246 .086943 -2.19 0.028 -.3611298 -.0203195
                            |
                            conflict |
                            L1. | .0056272 .0004521 12.45 0.000 .0047411 .0065132
                            L2. | -.000775 .0005668 -1.37 0.172 -.0018858 .0003359
                            L3. | .0015817 .0002988 5.29 0.000 .0009961 .0021672
                            L4. | -.0018533 .0004715 -3.93 0.000 -.0027774 -.0009292
                            L5. | -.0012395 .000438 -2.83 0.005 -.0020979 -.0003812
                            L6. | .0007748 .0008453 0.92 0.359 -.000882 .0024317
                            L7. | .0002974 .0004586 0.65 0.517 -.0006015 .0011963
                            L8. | .0012419 .0005138 2.42 0.016 .0002348 .002249

                            ------------------------------------------------------------------------------
                            conflict | Coefficient Std. err. z P>|z| [95% conf. interval]
                            -------------+----------------------------------------------------------------
                            longrun | -.1918094 .0873804 -2.20 0.028 -.3630718 -.0205469
                            shortrun | -.1907246 .086943 -2.19 0.028 -.3611298 -.0203195
                            persistence | .0056552 .001164 4.86 0.000 .0033738 .0079365
                            lag1 | .0056272 .0004521 12.45 0.000 .0047411 .0065132
                            lag2 | -.000775 .0005668 -1.37 0.172 -.0018858 .0003359
                            lag3 | .0015817 .0002988 5.29 0.000 .0009961 .0021672
                            lag4 | -.0018533 .0004715 -3.93 0.000 -.0027774 -.0009292
                            lag5 | -.0012395 .000438 -2.83 0.005 -.0020979 -.0003812
                            lag6 | .0007748 .0008453 0.92 0.359 -.000882 .0024317
                            lag7 | .0002974 .0004586 0.65 0.517 -.0006015 .0011963
                            lag8 | .0012419 .0005138 2.42 0.016 .0002348 .002249
                            ------------------------------------------------------------------------------

                            Comment


                            • #15
                              Dear Janvi Fung,

                              Using PPML has the advantage, among others, that you can use the observations where the dependent variable is zero. However, in a dynamic model you have the problem that you cannot include the log of the lagged dependent variable. The solution to this problem will depend on the nature of the application, but one solution is to use the lags of log(y+1) a regressor. An alternative that is perhaps better is to use as the explanatory variable lags of log(y), replacing the missing values with zeros and adding a dummy identifying those observations.

                              Best wishes,

                              Joao

                              Comment

                              Working...
                              X