Need help on Result of Difference in Differences estimator

Mika Shin

Join Date: Mar 2017

Posts: 16
#1

Need help on Result of Difference in Differences estimator

20 Mar 2017, 23:35

I have performed a difference-in-differences but result shows that DID is not significant.

Although I would like to see effect of policy on rate of land abandonment(Aband: Y), I don't understand what variables shows the effect.
Also, I don't know what it means if DID is not significant.

I used data set in 2010 and 2015, (2 time points), treatment and control group,

Command I used is
-> reg Aband time treated did, r

results are as below
----------------------------------------------------------------
Linear regression Number of obs = 332
F( 3, 328) = 6.85
Prob > F = 0.0002
R-squared = 0.0770
Root MSE = .09989
Robust
---------------------------------------------------------------------------------------------------------------------
Aband Coef. Std. Err. t P>|t| [95% Conf. Interval]
time .0007759 .0252551 0.03 0.976 -.0489064 .0504583
treated -.0577514 .0176226 -3.28 0.001 -.092419 -.0230839
did -.0006129 .0256551 -0.02 0.981 -.0510821 .0498563
_cons .0767172 .0173637 4.42 0.000 .042559 .1108755
13
----------------------------------------------------------------------------------------------------------------------
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

21 Mar 2017, 00:15

In the future, please put Stata output in between code delimiters when posting here. As you can see, what you show is jumbled and difficult to read. Code delimiters overcome that problem. See FAQ #12 for how to do that.

You don't say as much, but I'll assume that your did variable was calculated as time * treatment. (If not, your entire analysis is incorrect and there is no point trying to interpret it.)

Working from your output, the coefficient of did is your estimator of the treatment effect. In greater detail, in the control group (treated = 0), after the policy date, Aband increased by 0.0007759 (coefficient of time). In the treatment group (treated = 1), after the policy date, Aband increased by 0.0007759 - 0.0006129 (sum of coefficients of time and did) = 0.000163. The treatment effect is the difference between those increases, the coefficient of your did variable, -0.0006129, which given the variability in your outcome and your sample size had a confidence interval of -.0510821 to +0.0498563, and hence was not statistically significant. Otherwise put, your data are consistent with the null hypothesis that the change in Aband after the policy date was the same in both the treatment and control groups, at the .05 significance level.

Now, for reasons I will not elaborate here because of the late hour (where I am), I do not like to interpret interaction models based on their p-values. Briefly, interaction terms are always underpowered compared to their constituent main effects. Since 332 is not a huge sample to start with, your power to detect this interaction was probably rather low. I think it is more sensible to look instead at the predicted values of the outcome variable in all four groups, and the marginal effect of the time variable in both treatment groups, and then make a judgment call as to whether the differences are large enough to matter from a practical perspective. In this case, the key issue is whether an increase in Aband of 0.0007759 is, from a practical perspective, materially different from an increase of 0.000163. If it is, then your did is pragmatically significant.

To really understand your model and its implications, I urge you to re-run it using factor variable notation. See -help fvvarlist- for details. But in your case it would amount to:

Code:

regress Aband i.time##i.treated, robust

And then you can run -margins-:

Code:

margins time#treated margins treated, dydx(time)

which will show you the expected values of Aband in each combination of time and treat, and then the change in Aband over time in both groups.. All of the calculations I outlined above are done for you automatically here, with no effort and no confusion about what gets added with what. Again, a comparison of the marginal effects of time in each group from a practical perspective is, in my view, the most important thing here. That is a judgment call based on your knowledge of the underlying science and content you are working on; not a statistical one. If you are not comfortable making that kind of judgment call, you should call on colleagues in your discipline for advice.

Nevertheless, given the wide confidence interval, there evidently remains considerable uncertainty about the size, and even the sign of this difference. If your conclusion is that a difference of this magnitude is large enough to matter for practical purposes, you might want to shore up some of the uncertainty in the estimated effect by doing another study with an appreciably larger sample. If your conclusion is that this difference is too small to matter, then you probably would not investigate any farther and just conclude that this particular policy did not have any material effect on Aband.
Comment
Mika Shin

Join Date: Mar 2017

Posts: 16
#3

27 Mar 2017, 22:59

Thank you so much for your advice.
As you said, I used commands as below.

Code:

xtset Municipality year

Code:

gen time=(year>=2010) & !missing(year)

Code:

gen treated= (Municipality<97) & !missing(Municipality)

Code:

gen time=(year>2010) & !missing(year)

Code:

gen did = time*treated

Code:

reg y time treated did, r

I would like to know if I should show the result of this or I should just use yours only.

Anyway, now I tried your codes.
I got results and I think it is showing what I wanted to see even though p-value was not significant.

Is what I understood correct?
Even though p-value of each variables are not showing as "significant", if comparison of the marginal effects of time in each group from a practical perspective is expressed large enough, then the result ca be important

Since my data of "Aband"=rate of land abandonment is very small in each individuals, the change from year 2010 to 2015 is also very small. However, even the change is small, I want to say that the policy restrained rate of abandonment.

About the number of samples, I cannot access more data in this case, so it is difficult to increase the number of samples.

I actually tried Panel data analysis, using

Code:

xtreg Aband Numfarmers Parttime over65y treatment time did, re

and

Code:

xtreg Aband Numfarmers Parttime over65y treatment time did, fe

but as same as DID estimation, result of fixed effect model showed non of variables are significant, and random effect model showed that DID was not significant.
So, I don't know if this model explains what I want to know.
Comment
Mika Shin

Join Date: Mar 2017

Posts: 16
#4

27 Mar 2017, 23:00

Sorry, the 6th code is

Code:

reg Aband time treated did, r
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#5

28 Mar 2017, 08:22

Your questions in #3 and #4 are, for the most part, the same questions you asked in #1. I've re-read everything and I see nothing that changes my opinions. I just want to elaborate on one thing:

However, even the change is small, I want to say that the policy restrained rate of abandonment.

First of all, you should avoid causal language. Even though DID analyses do a somewhat better job of identifying causal effects than simpler cohort designs, they are not perfect in that regard. I would be more cautious in my language and say that "the implementation of the policy was followed by a change in the observed rate of abandonment." Then I would add "The effect was small and the confidence interval around it was wide enough that we cannot be confident even of its direction, although the majority of the confidence interval is negative."

As for choosing between -reg- and -xtreg-, this depends on the nature of your data. If the same municipalities were observed over multiple years, then you have panel data. That would imply that the observations of a given municipality are not independent. Therefore the -regress- analysis would be invalid, and you must use -xtreg-. instead. Either way, I reiterate that you should not calculate your own did variable. You should run these analyses using factor variable notation and follow the regression (whether -reg- or -xtreg-) with the -margins- commands. While it is possible to get the results that -margins- gives you in other ways following a regression without factor variables, it is far too easy to make a mistake and it takes more time and effort to code. Don't set yourself up for errors: do it the easy way!
Comment

Mika Shin

Join Date: Mar 2017
Posts: 16

28 Mar 2017, 18:56

Thank you so much for reply.
I got results like below.

Here, I don't understand mean of negative confidence interval, if it is negative, then it means it is not significant?
I of course searched but I still could not understand.

Code:

xtreg Aband i.time##i.treated, robust

Random-effects GLS regression                   Number of obs      =       332
Group variable: Municipality                    Number of groups   =       166

R-sq:  within  = 0.0003                         Obs per group: min =         2
       between = 0.0785                                        avg =       2.0
       overall = 0.0770                                        max =         2

                                                Wald chi2(3)       =     10.72
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0134

                         (Std. Err. adjusted for 166 clusters in Municipality)
------------------------------------------------------------------------------
             |               Robust
       Aband |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      1.time |   .0007759   .0046337     0.17   0.867     -.008306    .0098579
   1.treated |  -.0577514   .0176493    -3.27   0.001    -.0923434   -.0231595
             |
time#treated |
        1 1  |  -.0006129   .0050208    -0.12   0.903    -.0104534    .0092276
             |
       _cons |   .0767172     .01739     4.41   0.000     .0426335     .110801
-------------+----------------------------------------------------------------
     sigma_u |  .09777274
     sigma_e |  .02047808
         rho |  .95797601   (fraction of variance due to u_i)
---------------------------------------------------------------------------

So, treated is significant here, is it right?

Code:

 margins time#treated
Adjusted predictions                              Number of obs   =        332
Model VCE    : Robust

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
time#treated |
        0 0  |   .0767172     .01739     4.41   0.000     .0426335     .110801
        0 1  |   .0189658   .0030142     6.29   0.000     .0130581    .0248735
        1 0  |   .0774932   .0183668     4.22   0.000     .0414949    .1134914
        1 1  |   .0191288   .0033676     5.68   0.000     .0125285    .0257292
------------------------------------------------------------------------------

Above results, p-value is .000. Is this meaning that time#treated all combination is significant?

Code:

 margins treated, dydx(time)

Conditional marginal effects                      Number of obs   =        332
Model VCE    : Robust

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 1.time

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.time       |
     treated |
          0  |   .0007759   .0046337     0.17   0.867     -.008306    .0098579
          1  |   .0001631    .001933     0.08   0.933    -.0036256    .0039518
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

is this meaning that marginal effect in both group is not significant?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#7

28 Mar 2017, 21:44

I don't understand mean of negative confidence interval, if it is negative, then it means it is not significant?

I'm not sure what you mean by a "negative confidence interval." Numbers can be positive or negative (or zero), intervals can't. Most of your confidence intervals in the regression have a negative lower bound and a positive upper bound. So that means that the precision of your estimate is not sufficient to make confident assertions about the sign of the effect. Both positive and negative true coefficients are consistent with your data. Some people like to say that as meaning that the coefficient is not statistically significant. I encourage people to avoid thinking in terms of statistical significance because it is a very tricky concept, more widely misunderstood than understood, and often leads people to jump to totally wrong conclusions.

So I'm going to ignore your questions about what is and is not significant and walk you, instead, through how to interpret these results in meaningful terms.

Your model provides three coefficients: treated, time, and their interaction. The one for treated is usually of no interest: it provides an estimate of the difference before treatment between the treatment and control groups. It needs to be in the model so that we can compare it to the difference between the groups after treatment. But in and of itself, it is usually of little or no interest. So we can more or less ignore that one.

The time coefficient tells you the mean difference in the outcome between before and after treatment in the control group. It measures, if you will, whatever secular trends or placebo effects may be operating in your data. It shows how the outcome changes even when nothing relating to your intervention is happening. It, too, is usually of little or no interest in itself and is in the model so that you can compare it with the before vs after difference in the treatment group. It is that difference in differences (treatment group before vs after contrasted with control group before vs after) that is the core of the DID model, and is your estimate of the actual effect of treatment..

And that is what the interaction coefficient is all about. In your case, it's a very small number, -.0006129, with a relatively wide confidence interval, from -.0104534 to .0092276. That wide confidence interval tells us that the combination of a moderate sample size (332 observations in 166 pairs) and the extent of variability in your outcome variable, you could not get a very precise estimate of this difference in differences. Indeed the precision is so low that you cannot even say with confidence whether it is positive or negative. Another way of looking at it is that it is much, much smaller, by two orders of magnitude, than the coefficient of treated, which, as already noted, represents the difference in mean outcome between the two groups before the intervention took effect. That is another way of saying that this effect is pretty minuscule.

The first table of -margins- output shows you the expected value of the outcome in each combination of treatment and control X before and after. You can see that in both groups, the expected outcome is higher after than before, and you can also see that the decrease in each group is about the same amount. (This is another way of reflecting that the interaction coefficient in the regression output is very small.) You can also see that the expected outcomes in the treatment group are much lower than those in the control group, both before and after the intervention. You should definitely ignore the p-values in this table. As little value as they have elsewhere in your output, the ones here are totally pointless. They test the four null hypotheses that these four expected outcomes are zero. Unless you have some reason to test such a hypothesis, they are completely useless. It is rarely, if ever, the case that anyone cares about these.tests. The expected outcomes themselves, however, perhaps along with their standard errors or confidence intervals, are usually of some interest, and if nothing else, they provide some context for the DID estimator of the treatment effect. A table or graph of these is usually included in a results presentation for a DID analysis.

The second table of -margins- output shows you the mean difference in outcomes between before and after in each treatment group. In the control group, the expected outcome increase by about 0.0007759 after the intervention, and in the intervention group there was an increase of about 0.0001631. Both of these are very small changes, And here, again, the confidence intervals say that the small magnitude of the changes relative to the (im)precision of the ability to estimate them in this data sample, are such that we can't even make confident assertions as to whether these are really increases or could be decreases. In fact, I notice that in both groups, the estimated effect here lies close to the midpoint of the confidence interval. So these effects, although our best estimate of them is positive, are very, very close to zero. Again, comparing these changes to the expected outcomes shown in the first -margins- output table, we can say that in relative terms, these changes are about 1% as large as the actual expected outcomes. Again, in these terms, we see these changes to be small not just in absolute terms, but as a percentage of the values.

So my overall conclusion would be that this study could not demonstrate more than a very tiny effect of the intervention, and that the remaining uncertainty of the effect is such that we cannot be certain whether it's positive or negative, in any case, it represents less than 1% of the baseline values.

The part of the interpretation I cannot help you with is whether this is of practical importance or not. That requires knowledge and judgment of your content area. The questions you should ask your self are: does an effect of this magnitude (0.0006129) matter for any practical purposes? Does an effect that is less than 1% of the baseline values of the outcome matter for any practical purposes? If the answer to these questions is yes, then you could argue that more research is needed to get a more precise estimate. That research would necessarily require a much larger study sample, or perhaps a less fluctuating outcome measure. If the answer to these questions is no, then I think you can safely call this a study with negative results from all perspectives.
1 like
Comment
Mika Shin

Join Date: Mar 2017

Posts: 16
#8

26 Apr 2017, 19:09

Thank you so much for your kind explanation.

Now, if the coefficient is pretty small, how about using log for model?
In that case, how could I generate "ln" command for Independent variables? Since my independent variables are dummy, I am not sure if I only use "ln" for dependent variable.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#9

26 Apr 2017, 20:15

Applying log transformation to dummy variables makes no sense. In fact, it's impossible because you can't take the log of 0, which is one of the values.

I don't know much about your dependent variable. I understand it's the rate of abandonment, but that doesn't give me any insight into the range of values it takes. In any case, since the log transformation tends to shrink the range of variables, sometimes very dramatically, log-transforming your dependent variable will, if anything, make your regression coefficients smaller. And in any case, the choice of whether to log transform a variable should never be based on the desired size of the coefficient: the coefficient is what it is and if you don't like it you have to adjust your preferences accordingly. The decision to log-transform a variable should be based on the observation that the relationship between the dependent and independent variables is actually logarithmic rather than linear. But, again, in the context of a model where all the predictors are dichotomies this is pointless. Your predictors don't have enough information to distinguish a linear from a logarithmic model here.

At the end of the day, if you are trying to tinker with your model in the hopes of coming up with something more in line with your preconceived hopes or beliefs, don't. It isn't science. The model should be determined before the analysis,based on scientific understanding. When there is no scientific basis for specifying a model, it is reasonable to explore the data in various ways. But then you are obligated to present any conclusions you reach as exploratory and requiring independent confirmation.

In any case, with just two dichotomous predictors plus their interaction, it is essentially impossible to come up with a model that is materially different from what you have already done.
Comment
Mika Shin

Join Date: Mar 2017

Posts: 16
#10

15 May 2017, 19:54

Thank you so much for your advice. I studied Log and understood what you meant by dummy var doesnt make sense.

Now I need to try command as follow.

xtreg lnAb i.time##i. treated Numfarmers parttime over65y

However, it seems that there is this command "$xlist " I may be supporsed to use according to text book for my independent variables Numfarmers parttime over65y. But I am not sure how it works (what it means) and how to run this $xlist even I read the book.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#11

16 May 2017, 09:27

There is definitely no command $xlist. You couldn't create one if you wanted to, because it would be a syntax error as a command name. Worse yet, anything that begins with the $ character is interpreted by Stata as a global macro. If it is defined, it will be interpreted as the contents of that global macro. If $xlist is undefined, it will be treated as an empty string.

I think you need you need to show the excerpt from your textbook, so somebody can figure out what it means.

The syntax of the command you show in #10 is fine, except that there should not be a space between i. and treated.
Comment
Mika Shin

Join Date: Mar 2017

Posts: 16
#12

14 Jul 2017, 00:37

It was thank you for all advise Clyde.

I studied about $xlist, but it meant just the same as listing independent variables on command. So, it is clear now.

Now, since my coefficient was pretty small, I added one more year. Now my panel data has 3-year data, 2005,2010 and 2015.

I would like to see the effect of policy which implemented 2005.
but then what variable is for the effect of policy?
Is it still possible to use DID for the effect of policy even if I have 3-time period?

I tried this.

xtreg log_Ab Numfarmers Parttime over65y time treated did,fe

thank you.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#13

14 Jul 2017, 08:29

There is, in principle, no limit to the number of time periods you can have in a DID model.

But in order to do DID you need information both before and after the policy goes into effect. You are describing three time periods, but all of them after implementation. So, no, this data will not support a DID analysis.
1 like
Comment
Mika Shin

Join Date: Mar 2017

Posts: 16
#14

15 Jul 2017, 21:29

Thank you Clyde for prompt reply.

Actually the data of 2005 is taken in 2004 before implementation. So, 2005 is data of BEFORE implementation in my understanding. (This data is proviced by govn. So the title of data is like survey of ~~ 2005.)

So, DID is no limit to the number of time periods. It means that I can set DID variable as the same as 2 period of time?

or is there any other way to set dummy variable of year in case of more than 2-year period of time?

I did

gen time= (year>2005) & !missing(year)

gen did= time*treated

in this code,
i created 2005=0, 2010,2015=1 in panel data variable.

thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#15

15 Jul 2017, 21:37

With multiple post-intervention periods you have a choice. You can do:

Code:

gen pre_post = (year > 205) & !missing(year) regress outcome i.treated##i.pre_post

Or you can do this:

Code:

gen pre_post = 0 if year == 2005 replace pre_post = 1 if year == 2010 replace pre_post = 2 if year == 2014 regress outcome i.treated##i.pre_post

It really depends on whether theory suggests that the effect of treatment should differ in 2010 as compared with 2014. For example, if theory suggests that the effects will be cumulative, and therefore greater in 2014 than in 2010 I would go for the 3 level pre_post. Similarly if there is reason to think the effect will decay after a while, then it would absolutely be crucial to use the three-level pre_post. If, however, theory suggests that the implementation of the intervention will lead to an effect that persists with little change over time, then the 2-level pre_post model will provide more efficient estimation.

Note: You are just handicapping yourself by doing things like -gen did = time*treated-. Use factor-variable notation. The interpretation of interaction models from regression output is difficult and, especially for beginners, confusing and typically done incorrectly the first several times. The -margins- command makes it all quite simple and straightforward. But if you use variables like did = time*treated in your regression then you can't use -margins-. So don't cut yourself off from one of Stata's best features (-margins-) especially in an application where it is more likely than not you will mess up without it. You've already used factor-variable notation and -margins- in some of the posts above. Stick with it!
1 like
Comment

Announcement

Need help on Result of Difference in Differences estimator

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment