Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help on how to interpret coefficient of OLS and PPML Gravity model estimation

    Dear all,

    I would like to ask for help in interpreting the coefficient of OLS and the coefficient of the PPML Gravity model. My dependent variable is tax payments. I used the log of tax payment for OLS estimation. Meanwhile, for estimation using the PPML Gravity model, I used tax payment (without using log). I used PPML estimation due to many zero-tax payments in my data.
    I use ordinary least squares (OLS) with the following specification:

    yi = α + β1 〖T1〗i + β2 〖T2〗i + β3 T3〗i + γXi + εi

    where outcome yi is log of tax payment; 〖T1〗_i, 〖T2〗_i, and 〖T3〗_i are binary variables indicating the treatment messages (T1 = deterrence, T2 = trust, and T3 = reciprocity) respectively, for individual i; X_i is a vector of control variable comprising taxpayers' observable characteristics (gender, age, location, and status); and ε_i is an error term.

    Results after running my model for OLS (log of tax payments)

    β1 = 0.6476
    β2 = 0.1849
    β3 = 0.1471

    Results after running my model for PPML (tax payments)

    β1 = 1.1088
    β2 = 0.3181
    β3 = 0.2505


    Iam so lost and I hope somebody can help me on how to interpret this coefficient. Thank you so much.

    Best,

    Krisnanto

  • #2
    Krisnanto:
    welcome to this forum.
    My reply is about your OLS query only.
    Since you coded a log-linear model, other things being equal:
    1) for 1 unit change (form 0 to 1) in T1, the regressand increases by 70.30%
    Code:
    . di exp((0.6476)-1)
    .70299887
    2) for 1 unit change (form 0 to 1) in T2, the regressand increases by 44.26%
    Code:
    . di exp((0.1849)-1)
    .44259507
    3) for 1 unit change (form 0 to 1) in T2, the regressand increases by 42.2%
    Code:
    . di exp(( 0.1471)-1)
    .42617722
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Dear Krisnanto dwi,

      I believe there is a typo in Carlo Lazzaro's helpful advice; the "-1" should be out of the exponential function. So, for example, for 1 unit change (form 0 to 1) in T3, the expected value of the dependent variable increases by 15.8%.
      Code:
      . di exp(0.1471)-1
      .1584698
      The interpretation of the PPML results is exactly as for the OLS. Again for T3, we have that the expected change is 28.5%
      Code:
      . di exp(0.2505)-1
      .28466759
      Note that when you use PPML in this case you are not really estimating a gravity equation but simply a model with an exponential conditional expectation. A gravity equation is a special case where the model with exponential conditional expectation is used to describe the flow of something (e.g., people, goods, services, capital) from one region to another.

      Best wishes,

      Joao

      Comment


      • #4
        Thanks to Joao for pointing this out and apologise for the typo(s) (copy and paste is not a good recipe!):
        The correct values should be:
        Code:
        . di exp(0.6476)-1
        .91094904
        
        . di exp(0.1849)-1
        .20309812
        
        . di exp(0.1471)-1
        .1584698
        
        .
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Dear Prof. Joao Santos Silva , I wanted to pick up in this old thread and ask for help with interpreting the coefficients when ppmlhdfe, OLS with ln(dependent variable) and OLS with ln(dep var +1) give very different coefficients. You have mentioned in several posts that the interpretation of the PPML results is exactly as for the OLS. However,

          1. Is it equivalent to using logged DV with some small constant, or without adding any constant?
          2. If I am getting hugely different coefficients in the three ways, what should I interpret from that?

          On a different note, if I am using ppmlhdfe to set up a difference in differences design, should I be testing for parallel trends on the raw Dep variable or logged (Dep var) or logged (dep var with some constant?)

          I will appreciate your input on this.

          Comment


          • #6
            Dear Aparajita Agarwal,

            The most likely reason for you to obtain very different results with the 3 methods is that the results based on the OLS estimator are badly biased. So, I would just ignore them and focus on PPML.

            When using PPML for a dif-in-dif we need proportional trends (or parallel trends in logs).

            Best wishes,

            Joao

            Comment


            • #7
              Thank you very much, Prof. Joao Santos Silva. I had three follow-up queries on this.

              1. For my understanding, in general, for a dataset with high incidence of 0s in the DV, do we usually expect ppml to give coeficients similar to OLS with logging on only positive values (i.e. without adding a constant before logging) or including the 0s (i.e. after adding the constant). I assume it is the latter since ppml accounts for the 0s. Am I right in my understanding?

              2. relatedly, to show parallel trends in logs, does it mean a small constant needs to be added or should it be done on only non zero values?

              3. When the DV is a continuous variable with high number of zeroes (e.g. income earned by a daily wage earner in a week), my understanding is that ppml would not be the right method to use. If the zeroes represent true zeroes and not censored values, it would not be appropriate to use Tobit either. What would your recommendation be in this scenario?

              Thank you
              Aparajita

              Comment


              • #8
                Dear Aparajita Agarwal,

                1. There is no reason to expect the PPML estimates to be similar to those obtained by OLS (which is inconsistent); if they were similar, there would be little reason to use PPML.

                2. It is better to show proportional trends to avoid the problem with the zeros.

                3. That is exactly a case where PPML would be a reasonable approach. Why do you say it would not be the right method?

                Best wishes,

                Joao

                Comment


                • #9
                  Dear Prof. Joao Santos Silva , Thank you for your helpful input.

                  On the three points above:

                  1. I am actually trying to better understand what we should understand when we say that " that the interpretation of the PPML results is exactly as for the OLS". Is the interpretation the same when the DV is log (y) or log(y+ some constant). Or may be I am misunderstanding that statement.

                  2. Since generally in DID, it is more common to show parallel trends, I want to show parallel trends in logs. What would be the best way to handle the 0s in that case? Would it make sense to separately plot the trends for probability of observing non-zero and separately show trends of log(y) if y>0 ?

                  3. My understanding is that PPML is appropriate for count DV and not for continuous DVs. If the DV is continuous with lots of 0s, then what could we use?

                  Thank you so much for your extremely helpful advice.

                  Comment


                  • #10
                    Dear Aparajita Agarwal,

                    1. The interpretation of the parameters is defined by the model you want to estimate, not the estimator. PPML is a way of estimating multiplicative (or exponential) models such as the Cobb-Douglas production function or the gravity equation for trade, and that is what defines the meaning of the parameters. For many years, these models were estimated by taking logs on both sides and then using OLS. This approach is often invalid, but it is trying to identify the same set of parameters that you estimate with PPML (which is valid under very general conditions).

                    2. If you have zeros in the DV, you should show proportional trends.

                    3. There is nothing in PPML that makes it specific to count data; it can be used as long as we are willing to assume that the model is multiplicative (exponential). See, for example,
                    Santos Silva, J.M.C. and Tenreyro, Silvana (2006), The Log of Gravity, The Review of Economics and Statistics, 88(4), pp. 641-658
                    Santos Silva, J.M.C. and Tenreyro, Silvana (2022), The Log of Gravity at 15, Portuguese Economic Journal, 21, 423–437

                    Best wishes,

                    Joao

                    Comment


                    • #11
                      Thank you very much, Prof. @Joao Santos Silva . this is very helpful. I request your advice on two additional points:

                      1. My understanding is that showing proportional trends means showing the trends of the proportion of zeroes to nonzeroes for treated and control units over time? Or am I over-simplifying?

                      2. If I have an interaction term in a ppml model, where the moderator is the gender, what is the right way to translate the coefficient into the corresponding percentage equivalent difference between the two genders? Would it be right to use [exp(interaction coefficient)-1]*100%? i.e. if suppose the interaction coefficient is 0.1 , would it be right to say that the effect of the independent variable on the dep variable differs by [exp(0.1)-1] * 100% for males vs. females?

                      Comment


                      • #12
                        Dear Aparajita Agarwal,

                        1) There is no need to separate zeros and positives.

                        2) I would compute the effect for males and females and look at the difference, instead of looking only at the interaction.

                        Best wishes,

                        Joao

                        Comment


                        • #13
                          Thank you Prof. @Joao Santos Silva. Can you please point me to any source or reference that can help me understand how do I test for proportional trends? Apologies for my limited knowledge - I have never come across this before and all references of Difference in differences only refer to parallel trends. Thank you very much for your guidance.

                          Comment


                          • #14
                            Dear Aparajita Agarwal,

                            You can do the test using the same "placebo tests" used to test for parallel trends, but using Poisson regression. See also

                            Ciani, Emanuele & Fisher, Paul. (2018). Dif-in-Dif Estimators of Multiplicative Treatment Effects. Journal of Econometric Methods. 8. 10.1515/jem-2016-0011.

                            Best wishes,

                            Joao

                            Comment

                            Working...
                            X