No Parallel Trend before Difference-in-Difference estimation

Ferran Franquesa

Join Date: Dec 2016

Posts: 8
#1

No Parallel Trend before Difference-in-Difference estimation

04 Apr 2017, 10:10

Dear all,

Before running my Difference-in-Difference estimation, I decided to matched my data with replacement. After the matching approach, I got the frequencies of all control units matched and proceed to use the stata command -expand- to duplicate said control units in my data set. I then end up with a data set where the control units appear as many times as the frequency indicates.

The issue is that I though that matching our data would make it everything much nicer before the difference-in-difference, but when I plot my outcome in my new data set, no Parallel Trend shows (before the matching, I could see it!)

Am I doing something wrong? Is it normal that the Parallel Trend assumption is not fulfilled after matching?

Thank you very much in advance,
Ferran
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

04 Apr 2017, 10:23

You're not necessarily doing anything wrong. Of course, without seeing the code and output, nobody can assure you that what you've done is correct, either.

Nature does not always cooperate with our research plans. It may just be that in the real world, the treatment and control groups did indeed "behave" differently before the intervention era began. It is also possible that other confounding variables obscured the difference when you examined this prior to the matching, and the matching unmasked the difference.

Now, if the parallel trends assumption is not fulfilled, it certainly weakens the persuasiveness of a DID analysis. Nevertheless, if the changes in outcome between the before and after intervention eras differ in the two groups, even if the two groups were on different courses before hand, it lends some credibility to the notion that the intervention altered the trajectory of the targeted group.

Again, though, nobody can really say whether you are doing something wrong unless you show what you've actually done. What you're saying could be quite correct, or it could be due to error on your part.
1 like
Comment

Ferran Franquesa

Join Date: Dec 2016
Posts: 8

04 Apr 2017, 11:13

Thanks for the quick reply, Clyde. So I have firms belonging to two countries (treatment and control group), and I wanted to match said firms on sales and on years (this last with exact matching). Here I provide the code that I have used for the matching:

Code:

teffects nnmatch (lev_w sales_w) (treatment), generate(match) osample(newvar) ematch(year) metric(euclidean)

Here is the dataset once the matching is finished and control units, treatment==0, have been duplicated as many times as needed:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double obs float weight str50 company int year float treatment double(lev_w mb_w) long sales_w double(roa_w tang_w) float(d treatmentxd finan_inst) byte newvar long match1 byte _merge
 57  1 "ANDRITZ AG - TOTAL DEBT % TOTAL ASSETS"          2008 0 14.35 1.86  3609812   6.09   .1088797380778757 0 0 0 0 687 3
 58  1 "ANDRITZ AG - TOTAL DEBT % TOTAL ASSETS"          2009 0 13.22 3.37  3197517   3.61   .1074465732868919 0 0 0 0 492 3
 59  1 "ANDRITZ AG - TOTAL DEBT % TOTAL ASSETS"          2010 0 11.17 4.56  3553787   5.41  .10345242876383344 0 0 0 0 353 3
 60  2 "ANDRITZ AG - TOTAL DEBT % TOTAL ASSETS"          2011 0  9.78 3.51  4595993   5.79  .09718053146797949 1 0 0 0 802 3
 63  1 "ANDRITZ AG - TOTAL DEBT % TOTAL ASSETS"          2014 0 11.42 4.62  5859269   4.14  .12427937147135495 1 0 0 0 105 3
141 29 "CA IMMOBILIEN AG - TOTAL DEBT % TOTAL ASSETS"    2008 0 44.27  .25   265856  -3.23   .8395542578492284 0 0 1 0  29 3
142 24 "CA IMMOBILIEN AG - TOTAL DEBT % TOTAL ASSETS"    2009 0 45.99  .44   245271    .68    .790700935408036 0 0 1 0 912 3
143 25 "CA IMMOBILIEN AG - TOTAL DEBT % TOTAL ASSETS"    2010 0 48.68  .62   246872   2.62   .8380136209633636 0 0 1 0 395 3
144 29 "CA IMMOBILIEN AG - TOTAL DEBT % TOTAL ASSETS"    2011 0 55.28  .42   407962    3.2   .8710030437758062 1 0 1 0 172 3
145 27 "CA IMMOBILIEN AG - TOTAL DEBT % TOTAL ASSETS"    2012 0 57.49  .54   409216   2.61   .8785364957481591 1 0 1 0 929 3
146 27 "CA IMMOBILIEN AG - TOTAL DEBT % TOTAL ASSETS"    2013 0 49.47  .61   440305   2.55    .741309741166034 1 0 1 0 174 3
147 21 "CA IMMOBILIEN AG - TOTAL DEBT % TOTAL ASSETS"    2014 0 33.52  .77   224597   3.55   .7085781532956603 1 0 1 0 560 3
183  5 "CONWERT IMMOBIL INV - TOTAL DEBT % TOTAL ASSETS" 2008 0 51.49  .27   402852   1.73   .7104788651486381 0 0 1 0 190 3
184 11 "CONWERT IMMOBIL INV - TOTAL DEBT % TOTAL ASSETS" 2009 0  50.4  .56   583700   2.64   .7251157603001318 0 0 1 0 891 3
185  7 "CONWERT IMMOBIL INV - TOTAL DEBT % TOTAL ASSETS" 2010 0 56.44   .7   568600   3.51   .7651172007700148 0 0 1 0 150 3
186  5 "CONWERT IMMOBIL INV - TOTAL DEBT % TOTAL ASSETS" 2011 0 53.53  .57   879300    2.2   .7689186190279935 1 0 1 0 543 3
187  8 "CONWERT IMMOBIL INV - TOTAL DEBT % TOTAL ASSETS" 2012 0 53.84  .78   638800  -3.75   .7841752156472969 1 0 1 0 264 3
188  1 "CONWERT IMMOBIL INV - TOTAL DEBT % TOTAL ASSETS" 2013 0 55.73  .72   516400   1.28     .83307352145834 1 0 1 0 125 3
189  5 "CONWERT IMMOBIL INV - TOTAL DEBT % TOTAL ASSETS" 2014 0 53.28  .76   381200   2.32   .8446271523401961 1 0 1 0 721 3
288  3 "DO & CO AG - TOTAL DEBT % TOTAL ASSETS"          2008 0  11.6 1.33   354625   4.36    .247731685990393 0 0 0 0 148 3
289  7 "DO & CO AG - TOTAL DEBT % TOTAL ASSETS"          2009 0  9.21   .9   387775   1.64   .3485011808877854 0 0 0 0 107 3
290  4 "DO & CO AG - TOTAL DEBT % TOTAL ASSETS"          2010 0     0 2.24   352744    5.8   .3425561244584482 0 0 0 0 682 3
291  7 "DO & CO AG - TOTAL DEBT % TOTAL ASSETS"          2011 0     0 1.77   426068   7.32  .23550463563433732 1 0 0 0 123 3
292  4 "DO & CO AG - TOTAL DEBT % TOTAL ASSETS"          2012 0     0 1.96   466355   7.24   .2362217406070452 1 0 0 0 649 3
293 11 "DO & CO AG - TOTAL DEBT % TOTAL ASSETS"          2013 0  4.73 2.18   576191   7.65  .36552865404685436 1 0 0 0 503 3
294 13 "DO & CO AG - TOTAL DEBT % TOTAL ASSETS"          2014 0 29.38 2.71   636140   6.96  .26025770441185203 1 0 0 0 931 3
323  5 "ERSTE GROUP BANK AG - TOTAL DEBT % TOTAL ASSETS" 2008 0 35.49  .59 14555300   1.01   .0118919457735247 0 0 1 0 435 3
324  3 "ERSTE GROUP BANK AG - TOTAL DEBT % TOTAL ASSETS" 2009 0 30.83  .95 13267800    .96   .0116467998628619 0 0 1 0 443 3
325  6 "ERSTE GROUP BANK AG - TOTAL DEBT % TOTAL ASSETS" 2010 0 27.88 1.07 11892800    .96 .011950175165434021 0 0 1 0  10 3
326  6 "ERSTE GROUP BANK AG - TOTAL DEBT % TOTAL ASSETS" 2011 0 28.83  .45 11926900    .27 .011490463631846502 1 0 1 0 431 3
327  7 "ERSTE GROUP BANK AG - TOTAL DEBT % TOTAL ASSETS" 2012 0 26.54  .81 12267800    .65 .010451948246906166 1 0 1 0 579 3
328 10 "ERSTE GROUP BANK AG - TOTAL DEBT % TOTAL ASSETS" 2013 0 25.33  .91 10203555   -.01 .010326359046141492 1 0 1 0 804 3
329  1 "ERSTE GROUP BANK AG - TOTAL DEBT % TOTAL ASSETS" 2014 0 23.96  .87  9171152   -.73 .011552062695950037 1 0 1 0 735 3
477 15 "IMMOFINANZ AG - TOTAL DEBT % TOTAL ASSETS"       2008 0 38.23  .25   769683   3.52   .6776126162810079 0 0 1 0 722 3
478  7 "IMMOFINANZ AG - TOTAL DEBT % TOTAL ASSETS"       2009 0 48.39  .25   888945 -12.28   .7388885149341323 0 0 1 0 254 3
479  9 "IMMOFINANZ AG - TOTAL DEBT % TOTAL ASSETS"       2010 0 46.09   .6   775832   2.62   .7558607945564433 0 0 1 0  31 3
480  8 "IMMOFINANZ AG - TOTAL DEBT % TOTAL ASSETS"       2011 0 45.41  .47   870452    4.6   .7690743782486568 1 0 1 0 543 3

So because I have matched with replacement, in my matched dataset I have the same number of control units than treated units. After that, I just plot the outcome, leverage, over time:

Click image for larger version

Name: Graph.png
Views: 1
Size: 11.3 KB
ID: 1382040

So clearly, no Parallel Trend assumption is met (Year==0 corresponds to the event date). If I, however, look at this graphic before doing the matching, I get:

Click image for larger version

Name: Graph2.png
Views: 1
Size: 10.4 KB
ID: 1382041

Where I see clearly a Parallel Trend assumption met. Then, am I missing something?

Best,
Ferran

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

04 Apr 2017, 12:25

Well, this seems like an odd kind of matching. -teffects- does not understand or respect panel structure. So you have a matching where in one year Company A is matched to Company X, but in another year it is matched to a different Company Y. So your matched pairs are now scrambling lots of other variables (both observed and unobserved) that may be relevant here. I don't think this is a useful way to match panel data. And I don't see how you could use these matched pairs in panel-data analyses and get valid results.

I think you need to find some different approach that matches each company in the treatment group to a company in the control group consistently over time. That means that you will probably not be able to get the match to be the nearest neighbor in sales in every year. I don't know what the best approach is here, and it probably depends in part on the nature of the relationship between sales and leverage.
1 like
Comment
Ferran Franquesa

Join Date: Dec 2016

Posts: 8
#5

04 Apr 2017, 13:27

Thanks Clyde, I totally get your point. I do not know why I thought that matching Company A one year with B and another with C would make sense.

I was thinking about changing my panel data into wide format, so I can have just 1 observation per company, and then match on the average of sales over time, for example. It is one thing that comes to my mind. After matching and constructing the matched dataset, I would change back to panel data. Does this whole thing make sense to you?

Best,
Ferran
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#6

04 Apr 2017, 14:06

I think the approach you outline in your second paragraph of #5 makes sense.

Whether the average sales (as opposed to some other summary statistic, or some weighted average, or least sum of simple or weighted squares difference, or whatever) is the best way to handle the time series is the best way I don't know, but that is a content issue and we're out of my discipline here.
Comment
Ferran Franquesa

Join Date: Dec 2016

Posts: 8
#7

05 Apr 2017, 11:51

Thanks again, Clyde. So I implemented the approach already mentioned:

- I first converted my pandel data into wide format data, then proceeded to do the matching with pre-treatment average of sales and leverage (IV and DP, respectively):

Code:

teffects nnmatch (mymean_leverage mymean_sales) (treatment), generate(match) osample(newvar) metric(euclidean)

- Then I got the weights for each control unit, changed it to panel data, and finally duplicate those control units that were used more than once. Of course, those control units are clusters in panel data.

- Once the matched dataset is obtained, I proceed to graph the leverage over time: and it looks much much better than before now! The Parallel trend can be assumed to be met now.

There are two issues still remaining:

1. I have matched my companies regarding the pre-treatment sales (and used the pre-treatment leverage as the outcome of the -teffects nnmatch- command). Is this right? Or should I match on sales in general, pre-treatment and post-treatment?

2. After the matching, the Difference-in-difference estimator increases the value a little bit and becomes insignificant (was significant before matching). Does this sound quite possible to you?

Best,
Ferran

Last edited by Ferran Franquesa; 05 Apr 2017, 11:55.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#8

05 Apr 2017, 12:49

1. I have matched my companies regarding the pre-treatment sales (and used the pre-treatment leverage as the outcome of the -teffects nnmatch- command). Is this right? Or should I match on sales in general, pre-treatment and post-treatment?

I would consider what you did the correct approach. If you start matching on the post-treatment outcome, then you would be, in effect, constraining the treatment and control groups to follow identical trajectories after treatment and any effects of interventions (or other things) would be obscured. If anything, I would lean in the other direction and match on the pre-treatment sales only, not the leverage. By matching on leverage, you are restricting the generalizability of your results to those firms for which there are matchable controls whose leverage-sales relationship in the pre-intervention period is similar to that of the case. That could be construed as over-matching, depending on the circumstances.

2. After the matching, the Difference-in-difference estimator increases the value a little bit and becomes insignificant (was significant before matching). Does this sound quite possible to you?

Yes, that's quite possible. The impact of matching on an analysis is not predictable: it can increase or decrease the apparent magnitude of effects.

I'll resist the temptation to go into my usual rant against interpreting these models based on p-values. (If you want to see it, there are plenty of examples of it on this Forum that you can probably easily find with a search.) I'll just say this: you state that the DID estimator changes "a little bit" and becomes "insignificant." If the estimator change is really only a little bit, then I'd imagine that your pre-matching p-value was only slightly below 0.05. So, this is the kind of thing that happens when you take the p < 0.05 convention and treat it as being reality, rather than a rule of thumb. Dichotomously classifying results as "significant" vs "not-significant," common though it is in the literature, is really a matter of sloppy thinking. A change in p-value from 0.049 to 0.051 is meaninglesss. So, even is a change from 0.04 to 0.06. It's the general problem you get from taking any continuous variable and dichotomizing it at a completely arbitrary, artificial cutoff. OK, I promised not to rant on about p-values, so I'll cut it here.
1 like
Comment
Ferran Franquesa

Join Date: Dec 2016

Posts: 8
#9

05 Apr 2017, 13:32

Many thanks, Clyde!

If anything, I would lean in the other direction and match on the pre-treatment sales only, not the leverage. By matching on leverage, you are restricting the generalizability of your results to those firms for which there are matchable controls whose leverage-sales relationship in the pre-intervention period is similar to that of the case

But I have not matched on leverage (only pre-treatment sales), have I? If I understood the -teffects nnmatch- command properly, I use the pre-treatment leverage as the outcome variable only.

Best,
Ferran
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#10

05 Apr 2017, 14:06

But I have not matched on leverage (only pre-treatment sales), have I? If I understood the -teffects nnmatch- command properly, I use the pre-treatment leverage as the outcome variable only.

Yes, the code you show is matching on sales only. I thought I saw you say somewhere that you had matched on both sales and leverage, but reviewing the thread I can't find that. I must have misread something along the way. Sorry!
Comment
Ferran Franquesa

Join Date: Dec 2016

Posts: 8
#11

05 Apr 2017, 14:27

You are right, I said it here (beginning of post #7):

- I first converted my pandel data into wide format data, then proceeded to do the matching with pre-treatment average of sales and leverage (IV and DP, respectively):

But it was definitely a wrong choice of words, because I clearly did not mean I was matching on leverage. Anyway, sorry for the confusion!

Many thanks again for taking the time to clear all these things up for me, much appreciated Clyde!
Comment
Jailos Lubinda

Join Date: Oct 2015

Posts: 8
#12

28 Apr 2017, 05:20

Hello, I have a similar issue to Ferran, with panel data from one country but 7 sites within the country. The data set has same variable collected for 8 years (2008-2015), but I cannot get past the problem of graping this data so that I can show the (trend per varaible, by site and year) be able to compare the variables within each site by year. I have tried my level best but ain't just good enough to resolve this. sample data:
ZonesYear u5_HS a5_HS All_HS u5_HHC
1 2008 14 231 245 5
1 2009 8 225 233 3
1 2010 47 1006 1053 28
1 2011 41 1232 1273 28
1 2012 63 1668 1731 19
1 2013 43 1140 1183 8
1 2014 57 1764 1821 14
1 2015 34 975 1009 9
2 2008 220 1455 1675 53
2 2009 404 2822 3226 116
2 2010 804 6862 7666 221
2 2011 602 5062 5664 140
2 2012 435 4359 4794 143
2 2013 322 3850 4172 151
2 2014 329 3327 3656 138
2 2015 213 1637 1850 67
3 2008 7 16 23 9
3 2009 20 66 86 3
3 2010 12 8 20
3 2011 1 2 3
3 2012 0 0 0
3 2013 0 0 0
3 2014 0 0 0
3 2015 0 1 1 0
4 2008 2310 6320 8630 1445
4 2009 1416 5278 6694 1047
4 2010 2761 15376 18137 1897
4 2011 2664 13939 16603 1312
4 2012 2489 16859 19348 1044
4 2013 1687 13693 15380 1149
4 2014 2021 15473 17494 783
4 2015 1418 8856 10274 534

Kindly help how I can proceed.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#13

28 Apr 2017, 09:17

Well, first you need some variable that distinguishes which are the treatment zones and which are the control zones. You don't have that. You also need a variable that distinguishes the years before intervention from those after. You don't have that either. It's also not clear which of your variables is the outcome you want to graph and compare.

Just to illustrate how you might proceed, I'll pretend that zones 1 and 2 are the treatment group, the intervention begins in 2013, and the outcome of interest is All_HS

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(Zones Year u5_HS a5_HS All_HS u5_HHC) 1 2008 14 231 245 5 1 2009 8 225 233 3 1 2010 47 1006 1053 28 1 2011 41 1232 1273 28 1 2012 63 1668 1731 19 1 2013 43 1140 1183 8 1 2014 57 1764 1821 14 1 2015 34 975 1009 9 2 2008 220 1455 1675 53 2 2009 404 2822 3226 116 2 2010 804 6862 7666 221 2 2011 602 5062 5664 140 2 2012 435 4359 4794 143 2 2013 322 3850 4172 151 2 2014 329 3327 3656 138 2 2015 213 1637 1850 67 3 2008 7 16 23 9 3 2009 20 66 86 3 3 2015 0 1 1 0 4 2008 2310 6320 8630 1445 4 2009 1416 5278 6694 1047 4 2010 2761 15376 18137 1897 4 2011 2664 13939 16603 1312 4 2012 2489 16859 19348 1044 4 2013 1687 13693 15380 1149 4 2014 2021 15473 17494 783 4 2015 1418 8856 10274 534 end gen byte treatment = inlist(Zones, 1, 2) gen byte pre_post = (Year > 2012) collapse (mean) All_HS (first) pre_post, by(treatment Year) separate All_HS, by(treatment) graph twoway line All_HS? Year if pre_post == 0

In the future, when posting example data, please use the -dataex- command, as I have done here. Run -ssc install dataex- and then run -help dataex- to read the simple instructions for using it. The way you posted your data, it was not particularly hard to import into Stata, but the result may not be truly faithful to your data configuration, because important details such as storage types, labeling, etc., are missing. By using -dataex- you enable those who want to help you to create a completely faithful replica of your Stata example with a simple copy/paste operation.
.
Comment
Sandro Raffaele

Join Date: Jan 2019

Posts: 7
#14

24 Jan 2019, 16:11

In my Difference-In-Difference-Example I would like to generate a trendline for the treatment group (Leverage_w1) after the year 2014 (because then is the treatment), which should look like the timeline from the controlgroup (Leverage_w0). Is there a code I can enter? Or something in the graph editor?

At the moment I have this code

Code:

preserve collapse (mean) Leverage_w, by (treated year) reshape wide Leverage_w, i(year) j(treated) graph twoway line Leverage_w0 Leverage_w1 year, ytitle(%) xtitle(year end) xline(2014) sort restore

Many thanks for the answer.
Comment

Announcement