Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • High R-square in ppml

    Hi statalist,

    I would like to clarify few things about using ppml on gravity data.

    1. I'm trying to do a gravity model using patent data. My dependent variable is bilateral patent counts. independent variables are gdp, economic globalization, education of both country of origin and destination. I have included i.year, i.origin, i.dest for fixed effects and clustering by (dist). I noticed that my R-square is close to 0.92 and with additional variables it increases to 0.97. This is extremely high and should I worry about it?

    2. Is there a way to get the adjusted r-square using ppml?

    3. I'm interested in doing some mediation and moderation analysis on the gravity data but not sure of the codes to use.

    Appreciate your advice.

    Thanks.
    Jaya

  • #2
    Dear Statalist community,

    I would like to clarify few things about using ppml on gravity data.

    1. I'm trying to do a gravity model using patent data. I have panel gravity dataset with 100000 observations. My dependent variable is bilateral patent counts. independent variables are gdp, economic globalization, education of both country of origin and destination. I have included year_*, origin_*, dest_* as fixed effects and clustered the standard errors by (dist). I noticed that my R-square is close to 0.92 and with additional variables it increases to 0.97. This is extremely high and should I worry about it?

    2. I have not declared the dataset as a panel using xtset countrypair year. Is this needed for ppml estimation?


    3. Is there a way to get the adjusted r-square using ppml?

    4. Instead of using year, origin, destination fixed effects, I have tried year_*, origin_destination_* fixed effects (this created 5049 dummies) as well as origin_year_*, destination_year_* fixed effects. However, ppml method takes an extremely long time to estimate. Is this expected?

    5. I'm interested in doing some mediation and moderation analysis on the gravity data but not sure whether ppml can be modified. Is there a method to perform mediation and moderation analysis on gravity data with dependent variable having large zero values?

    Appreciate your advice.

    Thanks.
    Jaya

    Comment


    • #3
      No idea about your modeling approach, but I do know that when you start throwing lots of fixed effects at stuff, you'll artificially inflate the R-squared stat (typically). I mean... based off your posting so far, good Lord, you have 5049 dummies. I suspect that adding in additional covariates would inflate the R-squared stat even more, as you indicate. My biggest concern, knowing nothing about your research question, would be "Are my variables really explaining 97% of the variation of my outcome? Possible. Likely? No.

      I know about meditation/moderation analysis, but I've never done it myself. Honestly, in situations like this, this is when I break out my econometrics textbooks and I start to look for basic intros to the method and how I might implement it.

      Comment


      • #4
        DN Jay If I were you, I would start here.

        Comment


        • #5
          Dear DN Jay,

          1 - Gravity equations typically have very high R2, even when you do not include fixed effects; with fixed effects, having R2 well above 0.9 is frequent. Notice, however, that different commands report different kinds of R2 and that these are non-linear models and therefore do not have the standard interpretation. For example, they do not measure the explained variation (cf. #3 above).

          2 - No, that is not needed.

          3 - You can, but you will not gain much by doing it. Notice that in general both the R2 and the AR2 are not interesting statistics.

          4 - Yes, that is expected, but you can use ppmlhdfe to deal with those high-dimensional fixed effects; it is much faster. You should also decide what kind of fixed effects you need based on economic reasoning, not so much on experimentation.

          5 - I am not an expert in the area, but you should be able to do almost anything you can do in a linear model.

          Best wishes,

          Joao

          Comment


          • #6
            Dear Prof Santos Silva,

            Thank you very much for your feedback. The code I use for my analysis is as follows.

            ppml patents pc1_CS_od ln_gdp_o ln_gdp_d ln_education_o ln_education_d ln_tradeflow_od ln_dist contig comlang_off colony year_* origin_* dest_*, cluster(pairid)

            My main explanatory variable is pc1_CS and it is a cultural variable. _o and _d refers to origin and destination. I want to test whether a third variable, say X mediates the positive effect of pc1_CS on patents. For a usual panel dataset, I'm familiar with applying SEM to test for mediating effects. But in this context, how can I simply modify the model to test the mediation effects of a third variable on the relationship between pc1_CS and patents?

            Thank you very much.

            Best wishes,
            DN Jay
            Last edited by DN Jay; 07 Sep 2021, 21:18.

            Comment


            • #7
              Originally posted by Jared Greathouse View Post
              DN Jay If I were you, I would start here.
              Thank you Jared. I am familiar with using SEM to do mediation and moderation analysis. But do you know how ppml can be modified for this?

              Comment


              • #8
                Dear DN Jay,

                I believe you can use the 3 steps mentioned here https://en.wikipedia.org/wiki/Mediation_(statistics), but use ppml in the first and third steps, and a suitable regression on the second step (it depends on the nature of pcs1_CS).

                Best wishes,

                Joao

                Comment


                • #9
                  Dear Professor Santos silva,

                  I have been using PPML to estimate my gravity model with time-invariant country fixed effects and year fixed effects. However, Yutov et. al(2017) recommends that we use time-varying country fixed effects to deal with unobservable multilateral resistance (MR). The issue is, that when I introduce time-varying country fixed effects, it absorbs most of the variables that I'm interested to study. May I know whether PPML inherently tackles the issue of MR and using time-invariant country FE will not be a problem?

                  Thank you,
                  Dini

                  Comment


                  • #10
                    Dear DN Jay,

                    The choice of fixed effects to use is a fundamental modelling question, so it is really up to you do decide what to do. Having said that, the standard practice is indeed to include time-varying origin and destination dummies to account for MR.

                    Best wishes,

                    Joao

                    Comment


                    • #11
                      Dear Professor Santos Silva,

                      May I know if PPML allows for interaction plots? If so, what is the command I should follow?

                      Best,
                      DN Jay

                      Comment


                      • #12
                        Dear DN Jay,

                        You can do with PPML anything you can do with OLS, but unfortunately I do not know the command as I do not use interaction plots.

                        Best wishes,

                        Joao

                        Comment

                        Working...
                        X