Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ppml, ppml_panel_sg and ppmlhdfe

    Hi all,

    I'm a PhD student and greatly appreciate your valuable comments on the following.

    I'm running a gravity equation for a panel with around 50,000 country pairs for 20 years. Also my equation includes lots of interaction variables (interact with year dummy) , that were used to convert time invariant variables in to time variant variables.

    When I run the ppml command with country and time fixed effects, clustering the standard errors with country pairs, using strict option, I got the following;

    Many warning messages saying most of the interaction variables has very large values, consider rescaling or recentering. Also, the 56 number of regressors were excluded to ensure that the estimates exist. Ultimately, after many iterations, the output was obtained without standard errors and p values.

    My problem is Can I run the command without using Strict option?

    Then I tried with ppml_panel_sg, I got the results with all the values. However, it took more than 10 hours to generate the estimation. Also, when I tried to the RESET test after the ppml_panel_sg estimation it runs forever without any output.

    I also tried with ppmlhdfe. However, I got the error message r(3900) after a dayeven after I increase the mat size.


    Your kind comments are greatly appreciated.

    Niluka

  • #2
    Dear NILUKA PERERA EKANAYAKE,

    The strict option should not generally be used with PPML, so you should avoid it. It is normal that some variables are excluded to ensure existence of the estimates, but the warnings you are receiving and the problems you are having, suggest that you should really rescale your variables.

    Best wishes,

    Joao

    Comment


    • #3
      Dear Santos,

      I greatly appreciate your valuable and quick response. Actually, I estimated the model without Strict option. Still, I got coefficient values without std errors and p values. The warning message says variance matrix is nonsymmetric or highly singular. Could you please direct me on how to get rid of singular observations? I will rescale my variables and try again.

      Thankas a lot
      Niluka

      Comment


      • #4
        Dear NILUKA PERERA EKANAYAKE,

        There are a couple of possible reasons for this, but the most likely is that you have some perfectly collinear variables that Stata is not being smart enough to drop. So, besides rescaling the variables, please check that you are not including variables that are perfectly collinear with the fixed effects. Also, you should really try to use ppmlhdfe because it is much better at dealing with the fixed effects and should be much faster.

        Best wishes,

        Joao

        Comment


        • #5
          Dear Santos,

          Your great piece of advise saved my time a lot. Thanks again.

          I tried with rescaled variables and it was successful in generating results with p values and std errors. I'm using the predicted trade value from the gravity equation as an instrument for trade in an income effect analysis. Actually, I first run my regression with xtreg, xi: reg and then with ppml.

          Now I want to select the most appropriate model out of these three. I used the RESET test to choose the best model specification. I really want to take ppml estimation as it gives coefficient values that are closer to the existing literature. However, the RESET test P-value is closer to zero (0.0101) in ppml while the ho is not rejected for xi:reg with importer, exporter and time fixed effects.

          Can we say that ppml model is not specified correctly if the RESET test is failed? My regression includes lots of interaction variables ( which are interacted with time dummy eg: logdistance*year1).

          Your valuable time spends in assisting junior academics like us is greatly appreciated.

          Thanks
          Niluka

          Comment


          • #6
            Dear NILUKA PERERA EKANAYAKE,

            I am glad it worked. Stata is very sensitive to these numerical issues.

            I cannot imagine any real situation in which using a linear model for trade data is preferable to using PPML so that would be my choice. Maybe you can reconsider the specification of your model?

            Best wishes,

            Joao

            Comment


            • #7
              Dear Joao Santos Silva,

              Thank you so much for the valuable advice. I will try to change the specification of the model.

              Best
              Niluka

              Comment


              • #8
                Dear Prof Joao Santos Silva,

                Thank you so much for all the valuable comments on the previous posts of this thread. It is greatly appreciated if you could please confirm the following codes are accurate. I'm a bit confused about the "predict" code, as under PPML we are taking the dependent variable in levels my understanding is that we should not take the Exp(fitted values).

                ppml totaltrade ldisyear1 ldisyear2 ldisyear3 ldisyear4 …………bpop_o bpop_d diso3_o* diso3_d* dyear*, cluster(idpair)
                predict fitted_ppml_trade, mu


                Then to see the specification accuracy of the model I took the square of fitted values as follows to conduct the RESER test.
                gen fitted_ppml_trade_power=fitted_ppml_trade^2

                ppml totaltrade ldisyear1 ldisyear2 ldisyear3 ldisyear4 …………bpop_o bpop_d fitted_ppml_trade_power diso3_o* diso3_d* dyear* , cluster(idpair)
                test
                fitted_ppml_trade_power=0

                I used the fitted_ppml_trade as the predicted trade value in my income effect analysis without taking the Exp of the fitted values as we do under xtreg or xi:reg commands.

                Your valuable comments are greatly appreciated.

                Thank you
                Niluka Perera Ekanayake

                Comment


                • #9
                  Dear NILUKA PERERA EKANAYAKE,

                  You want to use "predict fitted_ppml_trade, xb", not what you are doing.

                  Best wishes,

                  Joao

                  Comment


                  • #10
                    Dear Prof Joao Santos Silva,

                    That means to predict the fitted values, I should do the following;

                    ppml totaltrade ldisyear1 ldisyear2 ldisyear3 ldisyear4 …………bpop_o bpop_d diso3_o* diso3_d* dyear*, cluster(idpair)
                    predict fitted_ppml_trade, xb


                    Can I please ask one more question. i.e. once I have obtained the fitted values do I need to take the exponential value of fitted values as we do under xtreg and Xi:reg?

                    Thank you so much for your prompt reply. It is greatly appreciated.

                    Best reagrds
                    Niluka Perera Ekanayake

                    Comment


                    • #11
                      Dear NILUKA PERERA EKANAYAKE,

                      For the RESET test, you should not take exp exponential value of the predictions to include in the second regression. If you want to use your model to get fitted values or to make predictions, then use the option "n" which is the same as taking the exponential of the values obtained with the option "xb".

                      Best wishes,

                      Joao

                      Comment


                      • #12
                        Dear Professor Santos,

                        Thank you for the help so far extended to me. I'm still having some unclear areas so solve. It would be greatly appreciated if you could clarify the following.

                        1. I'm using the gravity equation as follows to test the income effect of trade.
                        ppml totaltrade ldisyear1 ldisyear2 ldisyear3 ldisyear4 …………bpop_o bpop_d diso3_o* diso3_d* dyear*, cluster(idpair)
                        predict fitted_ppml_trade, xb

                        gen Trade_predict=exp(fitted_ppml_trade)

                        Then I use the Trade_predict in the income regression (in the IV to instrument actual trade value).


                        I want to know whether stated codes are correct. You have mentioned in the previous post to use option "n". I tried it, but I'm not sure how to use it. Could you please send me the code?
                        Furthermore, as we haven't used the log of trade value in the ppml, why we need to take the exponential value when we take the predict trade value?

                        2. I want to use pair fixed effects in the ppml. However, as there is a significant number of pairs that exist in the data set, stata does not allow me to generate the dummy variables for the pairs. As we need dummy variables in the ppml, could you please suggest as an alternative way to add pair fixed effects in the ppml?

                        Your help is greatly appreciated.

                        Thank you
                        Niluka

                        Comment


                        • #13
                          Dear NILUKA PERERA EKANAYAKE,

                          Sorry, the right option in this context is "mu"; if you do that you do not need to take the exponential of the predicted values.
                          To estimate models with fixed effects, I suggest you use the commands ppmlhdfe or xtpoisson with the fe option.

                          Best wishes,

                          Joao

                          Comment


                          • #14
                            Dear Prof. Santos,

                            Thank you so much. Your quick response is greatly appreciated.

                            Niluka

                            Comment


                            • #15
                              Dear Prof. Joao Santos Silva ,
                              Thank you for your valuable advice so far. I'm working on ppmlhdfe as you advised. However, I was not able to generate out of sample predictions using the following command. i.e. I want to generate fitted values for country pairs using the geographic variables that actually have not traded during a particular year.

                              ppmlhdfe trade_actual ldistw larea lpop border, a(imp#year exp#year, save) standardize_data(0) d vce(cluster idpair) nolog

                              predict fitppmlhd2, mu
                              gen trade_predict_TT_2=fitppmlhd2


                              If you could advise me how to obtain out of sample predictions, that would be a great help.

                              Thanks a lot
                              Niluka

                              Comment

                              Working...
                              X