Difference-in-Differences analysis

Franz Mayr

Join Date: Mar 2017

Posts: 5
#1

Difference-in-Differences analysis

01 Mar 2017, 02:17

Dear all,

I am quite new to STATA and maybe this is an easy question but I’m becoming desperate because I have no solution for my problem and so far none of the posts about DiD in this forum have helped me.

I have data of over 130 countries of their GDP, life-expectancy and their competitiveness score from the years 2010 – 2015. My professor told me to run a difference-in-differences analysis although I have no treatment and hence no control group. He told me to define the diff in diff on a timely basis which means that the year 2010 is t0 and then I have to look how the variables have developed. As far as I understood it, he wants the year 2010 to be the untreated group and the other years to be the treated group (but maybe I am wrong here). I cannot run a times series regression because there are not enough years.

The aim of this DiD is to identify how a changing competitiveness score of a country is connected with various other factors, like a country’s GDP, life-expectancy etc. I have to do a DiD for each of these factors.

I would be so grateful if someone can help me because I have no idea how to do it.

Franz
Tags: None
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#2

01 Mar 2017, 03:27

Princeton has a good, although very short guide: http://www.princeton.edu/~otorres/DID101.pdf
Advice: read page 1, but forget about creating the interaction variable yourself, and use the method in slide 4.

Also, wikipedia has a good graph fo making sense of this method:

In a nutshell,, you compare a trend (over time) between two groups of countries/persons/other ID.
You therefore need 1) a time variable, and 2) a treatment variable, or a variable that somehow groups your countries (example: adopting a certain policy measure).
Conflating the two makes little sense. The countries present in year 2010 would be the same group as those present in later years, so all would be treated in one time period, untreated in another.
Perhaps what your supervisor meant was that different countries become treated (ie., adopted the policy measure at a different point in time). It makes sense to refer to the first year of your data as the starting point of the trend for each of your countries, but not as the factor that decides treated vs untreated
Comment
Franz Mayr

Join Date: Mar 2017

Posts: 5
#3

01 Mar 2017, 07:44

Thank you very much for your advice!

What I still do not understand is how I should run the Diff in Diff since there are no treatments? And how should the time variable look like? 0 for year 2010 and then 1 for year 2011 and then what?
Comment

Jorrit Gosens

Join Date: Jan 2015
Posts: 1019

01 Mar 2017, 08:17

The time variable in these examples should be 1 when treatment has started. This could be 1 for all countries from e.g, 2010 onwards, or depend on when the country started treatment, e.g., from 2010 onward for country A, and from 2013 for country B onward.

The idea with the interaction variable (time##treated) in the slides form princeton is that this variable only equals 1 when a country received treatment, and treatment has started.

If you truly have no treatment, the DiD method makes no sense.

To illustrate why the suggestion of your supervisor (if you explain correctly) does not work, see this example with the grunfeld example dataset:

Example 1: assume companies with ID > 5 are treated, and treatment starts in 1941. Result: the treated group is significantly different from the control group

Code:

ssc install diff
webuse grunfeld, clear
gen treated = 0
replace treated=1 if company>5
*because this dataset has a time var already, we replace the values here
replace time = 0 
replace time = 1 if year>=1940
diff mvalue, t(treated) p(time)

Code:

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 200
            Baseline       Follow-up
   Control: 25             75          100
   Treated: 25             75          100
            50             150
------------------------------------------------------
 Outcome var.   | mvalue  | S. Err. |   t   |  P>|t|
----------------+---------+---------+-------+---------
Baseline        |         |         |       | 
   Control      | 1769.868|         |       | 
   Treated      | 256.824 |         |       | 
   Diff (T-C)   | -1.5e+03| 306.547 | -4.94 | 0.000***
Follow-up       |         |         |       | 
   Control      | 1855.824|         |       | 
   Treated      | 353.095 |         |       | 
   Diff (T-C)   | -1.5e+03| 176.985 | -8.49 | 0.000***
                |         |         |       | 
Diff-in-Diff    | 10.314  | 353.969 | 0.03  | 0.977
------------------------------------------------------
R-square:    0.33
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1

Example 2: (following your suggestion): All companies non-treated in year 1935, treated in all years after:

Code:

webuse grunfeld, clear
gen treated = 1
*because this dataset has a time var already, we replace the values here
replace time = 0 
replace time = 1 if year>1935
diff mvalue, t(treated) p(time)

Code:

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 200
            Baseline       Follow-up
   Control: 0              0           0
   Treated: 10             190         200
            10             190
------------------------------------------------------
 Outcome var.   | mvalue  | S. Err. |   t   |  P>|t|
----------------+---------+---------+-------+---------
Baseline        |         |         |       | 
   Control      | 707.471 |         |       | 
   Treated      | 707.471 |         |       | 
   Diff (T-C)   | 0.000   |    .    |    .  |    .
Follow-up       |         |         |       | 
   Control      | 1101.376|         |       | 
   Treated      | 1101.376|         |       | 
   Diff (T-C)   | 0.000   |    .    |    .  |    .
                |         |         |       | 
Diff-in-Diff    | 0.000   |    .    |    .  |    .
------------------------------------------------------
R-square:    0.00
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1

result: not possible to estimate any difference between treatment and control, because the companies were all subject to the same treatment.

Comment

Franz Mayr

Join Date: Mar 2017

Posts: 5
#5

01 Mar 2017, 08:50

Thank you again for your help!

This is exactly what I thought as well. But he insisted on doing a Diff in Diff. He said that I have to define the Diff in Diff on a time basis and that I have to observe how the variables develop over time (using the Diff in Diff). So is this somehow possible to use the 2010 as the untreated year and then the other years as the treated years? Or any other approach?

He argued that the Diff in Diff approach is possible here although there is no treatment and control group.
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#6

01 Mar 2017, 09:03

Again, DiD is out of the question if you have no treatment and control group. If your supervisor still believes this is possible, show him above example, and tell him that what he is asking for is equivalent to saying 'lets look at one group, and see if they differ from a group we dont look at'.

Any other approach? Other than DiD, then? There are many, suitability of which will depend on what you are trying to do. Plenty of flavours of time series or panel data estimation. Contrary to what you say in #1, the 130 ctries, 5 years, sound like a sufficient dataset for such an analysis to me. The question what analysis to use would better be asked in a new thread, though, where you better explain the purpose of your analysis, rather than focusing on this specific method.
Comment
Franz Mayr

Join Date: Mar 2017

Posts: 5
#7

01 Mar 2017, 11:00

Ok, thank you very much. I will email my supervisor.
Comment
Zihao Chen

Join Date: Aug 2020

Posts: 15
#8

31 Aug 2020, 18:04

Hi Jorrit,

If you still there, may I ask a question, about how to interpret the DiD estimation result?

Zihao
Comment

Announcement