Difference in Difference Interpretation

Gonzalo Etchart

Join Date: Jul 2018
Posts: 15

#16

30 Aug 2018, 13:24

Indeed. Most of the crops affected by this tax increase represent little export value. The vast majority of exports is concentrated around a few cash crops.

The product codes are shown on the right (6117 for example)- Agriculture crops (703 up till 4412)

Although I must admit I have failed at posting an easy to read example.

Since this is just just a brief study to highlight the importance of looking at empirical evidence prior to making policy making decisions (a very troubling habit of African governments) I will accept the results here for now. Although I am curious as to how this tax change affected separate crop types.

Regardless,
I am in awe of your command of statistical methods and friendliness in providing support on these matters.

Gonzalo

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int var1 double Exports byte(Agri1 Post)
 703     .00007 1 0
 706     .06874 1 0
 708     .69316 1 0
 709     5.4492 1 0
 710     .90165 1 0
 713   41.73544 1 0
 801    19.6316 1 0
 802   19.42681 1 0
 803   41.53306 1 0
 804     .34996 1 0
 805     .17759 1 0
 808     .00021 1 0
 810     .15776 1 0
 811     .00602 1 0
 901     .00404 1 0
 902    5.47964 1 0
 904     .67213 1 0
 910     .00202 1 0
1001    2.16894 1 0
1005   13.01239 1 0
1006     .45662 1 0
1202    7.95657 1 0
1203     .00207 1 0
1206      .4477 1 0
1207   47.48741 1 0
1701  81.668575 1 0
1702      .0047 1 0
2008    1.11226 1 0
2308     .03127 1 0
2401 258.024442 1 0
4401   12.99667 1 0
4403    21.9149 1 0
4404     .00147 1 0
4406    1.14136 1 0
4407    75.5329 1 0
4408     .16373 1 0
4409    2.16639 1 0
4410     .00047 1 0
4412     .00928 1 0
4601     .00011 0 0
4602     .00026 0 0
4707     .04327 0 0
4804     .00282 0 0
4805     .00531 0 0
4808     .11877 0 0
4809     .03567 0 0
4810      .3616 0 0
4811     .03569 0 0
4813      .0067 0 0
4816     .04639 0 0
4817     .00555 0 0
4818     .07081 0 0
4819    2.26675 0 0
4820     .07249 0 0
4821     .01101 0 0
4822      .0008 0 0
4823     .00796 0 0
4901     .29782 0 0
4905     .00177 0 0
4907     1.2773 0 0
4908     .18287 0 0
4911     .56331 0 0
5112     .36347 0 0
5201  48.343064 0 0
5202     .01365 0 0
5203   28.99824 0 0
5204     .58227 0 0
5205     1.8893 0 0
5303    1.98549 0 0
5305      .0048 0 0
5308    1.10415 0 0
5404     .21953 0 0
5407      .0017 0 0
5503     .01802 0 0
5506     .00954 0 0
5515     .00284 0 0
5601      .0002 0 0
5604     .00039 0 0
5607     1.1147 0 0
5609     .00035 0 0
5702     .00101 0 0
5810     .00634 0 0
5906     .00008 0 0
5907     .03216 0 0
5908     .03256 0 0
5909     .00274 0 0
5910     .00017 0 0
5911     .00663 0 0
6101     .03519 0 0
6103     .03531 0 0
6104     .04796 0 0
6105     .04452 0 0
6106     .00083 0 0
6108     .00317 0 0
6109     .09617 0 0
6110        .08 0 0
6112      .0003 0 0
6114     .00129 0 0
6116     .01198 0 0
6117     .00038 0 0
end

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#17

30 Aug 2018, 14:46

So, yes, there is an enormous variation here, accompanied by very high skewness, several outliers, and one very extreme outlier. So I think the kind of simple linear modeling approach we have been discussing so far is not adequate for this data. I can imagine some other approaches, but, again, I think you need advice from an economist who knows about crops first.

As for making policy without looking at the evidence, I would say it's a problem for governments around the world. I don't know if it is worse in Africa, but it certainly happens in the economically advanced countries, too.
Comment
Gonzalo Etchart

Join Date: Jul 2018

Posts: 15
#18

30 Aug 2018, 16:24

That is incredibly disheartening. I considered only modelling the main cash crops, cotton tobacco, sugar and cashew- as they are worth the largest value. However, I was worried that 4 treatment observations would never be considered statistically significant.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#19

30 Aug 2018, 16:32

Well, I agree that having only 4 treatment crops would be a rather unconvincingly small sample. But there may be ways to use all the data. You might be able to group the crops in some way so that the variation in each group is much smaller (and the variation between the groups will be large) and then fit a more complicated model that includes the group variable and a three-way interaction with Agri1 and Post. Such a model will be difficult, but not impossible, to interpret. But the groups have to be picked on the basis of some scientific or historical understanding of crop trade, not cherry picked from the data to produce a desired result. Another possibility is that log-transforming the data will make it more tractable. A random slopes model is another possibility. Each of these approaches might or might not be suitable, and each has some limitations or drawbacks. But you really need advice from an economist who knows about crop trade to figure out whether any of these is reasonable, or if perhaps something I haven't thought of is better.
Comment

Sandro Raffaele

Join Date: Jan 2019
Posts: 7

#20

25 Jan 2019, 01:24

Hello Clyde

Can it be that my difference in difference output has a omitted time variable? I investigate differences over a period from 2012-207 and my output look like this:

Code:

. areg Profit_w i.treated##i.time Unternehmensgrösse MTBV_w Abschreibungsrate_w Verschuldung_w CF_Marge_w i.IndustrieDummy
>  Zinslevel, a(Jahr)
note: 1.time omitted because of collinearity
note: Zinslevel omitted because of collinearity

Linear regression, absorbing indicators         Number of obs     =        438
Absorbed variable: Jahr                         No. of categories =          6
                                                F(  34,    398)   =      24.18
                                                Prob > F          =     0.0000
                                                R-squared         =     0.6760
                                                Adj R-squared     =     0.6442
                                                Root MSE          =    15.5115

-------------------------------------------------------------------------------------
           Profit_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
          1.treated |  -4.401589   2.560556    -1.72   0.086    -9.435495    .6323169
             1.time |          0  (omitted)
                    |
       treated#time |
               1 1  |   1.087765   2.982397     0.36   0.716    -4.775455    6.950985
                    |
 Unternehmensgrösse |   1.714273   .5123431     3.35   0.001     .7070359     2.72151
             MTBV_w |  -.7052734   .3189976    -2.21   0.028    -1.332404   -.0781425
Abschreibungsrate_w |  -.2396374   .5666368    -0.42   0.673    -1.353613     .874338
     Verschuldung_w |  -.1834585   .0664845    -2.76   0.006    -.3141631   -.0527539
         CF_Marge_w |   .7198487   .0567588    12.68   0.000     .6082641    .8314333
                    |
     IndustrieDummy |
                16  |  -19.26553   10.95501    -1.76   0.079    -40.80244    2.271387
                20  |   1.131143   7.247177     0.16   0.876    -13.11639    15.37868
                24  |  -2.701054    9.22128    -0.29   0.770    -20.82956    15.42745
                26  |   1.891269   9.304957     0.20   0.839    -16.40174    20.18428
                27  |  -15.20279   9.608678    -1.58   0.114     -34.0929    3.687315
                28  |  -8.982754   7.004267    -1.28   0.200    -22.75274    4.787231
                30  |   -2.08479   9.278925    -0.22   0.822    -20.32662    16.15704
                32  |  -22.07869   9.361095    -2.36   0.019    -40.48206   -3.675315
                34  |  -17.16137   9.361556    -1.83   0.068    -35.56565     1.24291
                35  |  -1.506517   6.985742    -0.22   0.829    -15.24008    12.22705
                36  |   .4325678   6.905993     0.06   0.950    -13.14422    14.00935
                38  |   7.422745   9.185583     0.81   0.420    -10.63558    25.48107
                42  |    11.8835   9.399556     1.26   0.207    -6.595485    30.36249
                48  |   2.438148   9.666004     0.25   0.801    -16.56466    21.44095
                49  |   -25.4078   9.277006    -2.74   0.006    -43.64585   -7.169736
                50  |   23.03084   16.92975     1.36   0.174    -10.25207    56.31376
                51  |  -81.87774   10.29351    -7.95   0.000    -102.1142    -61.6413
                59  |  -17.09642   7.589205    -2.25   0.025    -32.01636   -2.176482
                60  |  -24.87167   7.618397    -3.26   0.001      -39.849   -9.894344
                61  |  -23.31703   10.56651    -2.21   0.028    -44.09018   -2.543882
                62  |  -20.73248   6.806127    -3.05   0.002    -34.11293    -7.35203
                65  |  -2.821212   8.441724    -0.33   0.738    -19.41716    13.77473
                67  |  -11.56696   7.267554    -1.59   0.112    -25.85455    2.720636
                70  |  -9.601152    9.52187    -1.01   0.314     -28.3206    9.118296
                73  |  -2.845638   6.950014    -0.41   0.682    -16.50896    10.81769
                76  |  -1.599433   9.190073    -0.17   0.862    -19.66659    16.46772
                87  |  -2.015367    7.07389    -0.28   0.776    -15.92223    11.89149
                    |
          Zinslevel |          0  (omitted)
              _cons |   -9.30734   9.929595    -0.94   0.349    -28.82835    10.21367
-------------------------------------------------------------------------------------
F test of absorbed indicators: F(5, 398) = 0.930              Prob > F = 0.461

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#21

25 Jan 2019, 12:07

Assuming that time is a pre-post indicator for your DID model, then yes it is, and should be, omitted because it is colinear with the Jahr fixed effect (in any given Jahr, time will either always be 0 or always 1). This is not a problem. The coefficient of treated#time is still a valid DID estimator of the intervention effect.
Comment
Sandro Raffaele

Join Date: Jan 2019

Posts: 7
#22

25 Jan 2019, 13:12

Thanks a lot for your answer!
Comment
Sandro Raffaele

Join Date: Jan 2019

Posts: 7
#23

25 Jan 2019, 13:33

Clyde, I have some more question in terms of the interpretation of the values:

What does the columns "t" , P > t and 95% Confi Interval says?

if P>t is < 0.05, it is significant on a 95% Confi-Interval Level? Or do I mix wrong things? I read in anothr answer of you, that in difference-in-difference-models "A better way to work with these models is to forget about statistical significance. Think of it as trying to get a decent estimate of the size of the effect. The interaction coefficient in the DID model will be an estimator of that effect size. The confidence interval gives you a sense of the uncertainty attached to that estimate."

So I should ignore these numbers?

And could you please tell me what the _cons (constant) says about the model?

And what is the meaning of the last row of the output?

R-squared and adj. R-squared are clear.

Thanks in advance!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#24

25 Jan 2019, 13:54

The t and P > |t| columns give the t-statistic and p-value associated with the null hypothesis that the coefficient in that row is zero. In this kind of situation, you can also say that an effect is statistically significant at the .05 level if the 95% confidence interval excludes 0.

As you note, in my opinion, hypothesis testing is simply out of place in this context (using DID to attempt to identify the causal effect of an intervention). The null hypothesis of 0 effect is usually preposterous in the first place, so rejecting that hypothesis really could be done without even gathering data. Rather, the usual point is to get an estimate of how large the effect is. For that reason, I prefer to focus on the interaction coefficient itself, and, to get a sense of the uncertainty surrounding that estimate, the 95% confidence interval. I routinely ignore the t-test and p-value in this context. Not everybody shares my view about this, although I have yet to hear a persuasive argument why I'm wrong about it. I suppose, in theory, there could be situations where the null hypothesis of an intervention having no effect has some plausibility, and where it is more important to know whether the effect is zero or non-zero than to actually get a sense of how large the effect is. In that case, it would be sensible to focus on the t-test and p-value and ignore the confidence interval. But such situations have never arisen in my career and I have been unable to even imagine a situation in which they would. But only you know your research goals, and you have to use whatever statistics are appropriate to them.

The _cons term is the expected value of the outcome when all of the predictor variables (including the absorbed effects) are zero. It is usually of no real-world importance, unless there could be real life situations in which all of those variables are, in fact, zero, and such situations are of particular interest. (For example, if all of the predictors have been centered at their means, then the constant term would represent the expected value of y in the reference categories of the absorbed effects, i.e. the baseline year, for an entity which has exactly average values for all of the predictors, then this might be something to pay attention to.)

The last row of the output, F test of absorbed indicators: F(5, 398) = 0.930 Prob > F = 0.461, as an F-test of the null hypothesis that the coefficients of all of the abosrbed affects are zero. In your case, this means it's a test of the null hypothesis that the yearly shocks to the outcome in your system are all zero. As the result is a non-rejection of this hypothesis, some people would say you can revise your model to simply omit them. (That is, you could go back to pooled OLS.) Again, I do not generally endorse this approach to model selection. But you might see this commonly done in practice. So, this is another statistic that I would, at least in this particular case, ignore, but others might not.
Comment
Sandro Raffaele

Join Date: Jan 2019

Posts: 7
#25

25 Jan 2019, 14:19

Okay, again thanks for your detailed and helpful answer!

Now in this case is it true that:
1. my treated group has a 1.0877% I calculated my depending variabe in %) higher Profit over time than the control group, which is the interaction term?
2. The interaction effect is with a 95% confi between -4.77% and 6.95%?

What is about the treated coefficient (the -4.4%)? I'm a bit confused that I can't see the output of my control group and only the output of the treated group.

Thanks for answering my (hopefully) last question
Comment
Sandro Raffaele

Join Date: Jan 2019

Posts: 7
#26

25 Jan 2019, 14:24

Here I show you the corresponding graph: the red line represents the averages of the treated group for every year and the blue line the same for the control group. the y-axis is in %.

Attached Files

Profit.gph (7.7 KB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#27

25 Jan 2019, 16:51

. my treated group has a 1.0877% I calculated my depending variabe in %) higher Profit over time than the control group, which is the interaction term?

Assuming that your outcome variable is in percent, then this would be correctly said as that the intervention is associated with a 1.0877 percentage point greater change in profit in the treated group than in the control group.

The interaction effect is with a 95% confi between -4.77% and 6.95%?

Not %, percentage points. But otherwise correct.

What is about the treated coefficient (the -4.4%)?

It is meaninglesss. Ignore it.

I'm a bit confused that I can't see the output of my control group and only the output of the treated group.

No, you can't see the output of the treated group either--that you think you do means you do not understand the meaning of treated in this output. You are misinterpreting that meaningless treated coefficient as if it has something to do with the treated group--but it doesn't. It doesn't have anything to do with anything. It is an artifact of the model.
Comment
Sandro Raffaele

Join Date: Jan 2019

Posts: 7
#28

25 Jan 2019, 17:09

Ok! Thanks a lot!!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment