  • Interpreting Generalized Difference in Difference outputs - Is my coefficient cumulated?

    Hi everyone,

    [TLDR: I'm unsure how to interpret my results. Is the GDiD coefficient I get for for my log-ed variable "cumulative"? Any tips for meaningful interpretation?]

    I am Tobi, currently a Master student of economics and dealing with end-of-term papers at the moment. I feel like having spent countless hours on this forum already reading and researching, and finding much helpful information and suggestions in the past. Anyway, I don't consider myself a genius with either econometrics or Stata, so a lot of what I learn is trial and error, and maybe my question suggests a lack of basic understanding of what I am dealing with - that's how I feel anyway.

    To my question: For a research task I put together a panel dataset (ca. 200 countries à 11 years: 2009-2019) with a bunch of macroeconomic indicators to investigate the "effects" of a policy measure, the participation of states in the Belt and Road Initiative (BRI). I will give you just the core info, as I assume the question is much more general:
    I pursued a GDiD approach as countries in my dataset entered treatment, i.e. signed some BRI participation agreement, in different years from 2013 onwards. I also have log-ed DVs.

    BRIS: Dummy variable marking measurements during years of BRI participation
    TREAT: Dummy variable indiciating a country's status as a BRI participant at some point in time (fun fact, no country discontinued its participation so far, good for me)
    GDPPCP: GDP per capita PPP in current US$
    LGDPPCP: = log(GDPPCP) as the GDP variable has a skewed distribution in my dataset and the results with it look better. I transform it back using =exp(x)-1

    With that I did:

    xtset Ccode YEAR, yearly

    xtreg LGDPPCP i.BRIS i.TREAT i.YEAR , fe robust

    And I got:
    1.BRIS 0.038**
    1o.TREAT -
    2010.YEAR 0.040***
    2011.YEAR 0.084***
    2012.YEAR 0.114***
    2013.YEAR 0.151***
    2014.YEAR 0.182***
    2015.YEAR 0.184***
    2016.YEAR 0.220***
    2017.YEAR 0.260***
    2018.YEAR 0.291***
    2019.YEAR 0.325***
    Constant 9.129***
    Observations 2,139
    Number of Ccode 198
    R-squared 0.498
    Adj R-squared 0.495
    F-test 64.80
    Prob > F 0

    So, I noticed basically the longer my pre-treatment period the bigger the coefficient (I use inrange(YEAR, a, b) and some other spiels for testing), and the YEAR outputs seem to "accumulate" as well, which is something I was totally not aware before could or should be happening, if it is indeed the case, and if indeed I didn't make any other foolish mistakes. I am really confused atm, as it makes total sense and no sense at all to me at once, and causal effects in this particular example are dubious anyway... I would be really glad for any help.

    1) Does my proceeding make sense?
    2) Is the coefficient indeed cumulated and should it be like that?
    3) How to interpret this? Can I just divide it by the amount of years investigated to get something like a yearly factor showing treatment effects?

    Again, apologies if this is actually really simple and I really should have understood all this before bothering with such models...


    If you have different treatment dates, you need to look at csdid or something like it.

    It's not accumulating, I don't think. You're adding in higher GDP countries over time, possibly. i.YEAR should address inflation. Country fixed effects?

    Try this to see what happens.
    reghdfe LGDPPCP i.BRIS i.TREAT , absorb(country year) cluster(country)
    But with different treatment dates, you've got a bigger problem.

      My mentor wrote a small document of the different DD commands in Stata that you might find useful. I'm attaching it. I also did the same with synthetic controls, but since you're interested in DD, here it is.
