A few questions about STATA difference in difference using data from multiple time periods

Jung Won You

Join Date: Nov 2019

Posts: 3
#1

A few questions about STATA difference in difference using data from multiple time periods

15 Nov 2019, 21:13

Dear all,

I want to see the influence of a housing policy on housing prices in two regions, one was affected the other wasn't, by using difference in difference method with multiple time periods.

My equation is
Pr =B₀+ B₁*D_tr + B₂*t + B₃*D_tr*t + e

where Pr is price outcome, Dtr is the dummy variable for treatment, t is time dummy variable. so I'm guessing B₃ is the effect of the policy when I do did method.

However, the data set I have only has one period before the policy was enacted, and multiple time periods after the policy was enacted.

1) With this data, can I do the standard did method with regression and see the value of B3? or do I have to change my equation?

2) After I did the process, the F-value was empty. Is it because the equation I wrote is wrong or is it because of multiple time periods?

3) Is there any way I can find the change of B3 as time passes after the policy was enacted? Do I have to do it one time step at a time?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

15 Nov 2019, 21:59

I'm guessing B
₃
is the effect of the policy when I do did method.

Correct. Or, strictly speaking, B₃ is the estimated causal effect of the policy.

However, the data set I have only has one period before the policy was enacted, and multiple time periods after the policy was enacted.

1) With this data, can I do the standard did method with regression and see the value of B3? or do I have to change my equation?

Yes, you can. But with some caveats. The precision of your estimate will be lower than it would be if you had more pre-policy data. And, recall that one of the assumptions required for the DID method to actually identify a causal effect is that during the pre-policy period the trends in the two groups are parallel. With only one pre-policy time point it is impossible to calculate any pre-policy trends, so this important assumption becomes completely unverifiable. You are doing faith-based analysis here.

2) After I did the process, the F-value was empty. Is it because the equation I wrote is wrong or is it because of multiple time periods?

Since you show us neither the code you ran to implement your model, nor the complete output you got from Stata, nobody can answer this question. It can be said that it has nothing to do with the equation you wrote out. But whether your code properly implemented that equation we can only guess. There are a number of reasons why regression output will show a missing value for the F statistic. Some of them are problems, and some are not. If you want an answer to this question, you must show the exact code you ran and the complete output Stata gave you from it. Be sure to place those inside code delimiters so they are readable here. If you are not familiar with code delimiters, see Forum FAQ #12 or watch David Benson's video at
https://youtu.be/bXfaRCAOPbI.

3) Is there any way I can find the change of B3 as time passes after the policy was enacted? Do I have to do it one time step at a time?

Yes there is, in general. But the details depend on your data organization. So to get an answer to this question, you need to show an excerpt of your data, using the -dataex- command. (-dataex- is also discussed in FAQ #12 and in David Benson's video.) When posting example data be sure that your example includes data from the pre-policy period and several (at least three) different post-policy time periods. Also be sure it includes both affected and unaffected regions in all of the time periods shown.
Comment

Jung Won You

Join Date: Nov 2019
Posts: 3

16 Nov 2019, 00:12

Thank you so much for your help.
I added some more pre-policy data that I could find and ran the program again.

Tho code I used is like this:

Code:

  gen time = (date>=td(01aug2017)) & !missing(date)

Code:

  gen treated = (region<2) & !missing(region)

Code:

 gen did = time*treated

Code:

 reg pr time treated did, r

And the excerpt of my data is:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int date double pr long region float(time treated did)
20332  88.74616233327578 1 0 1 0
20362   89.0277993417882 1 0 1 0
20393  89.34846047302727 1 0 1 0
20423  89.41521557802412 1 0 1 0
20454  89.48857298774236 1 0 1 0
20485  89.56250337910525 1 0 1 0
20514  89.67146552386738 1 0 1 0
20545  89.76108241454965 1 0 1 0
20575  89.97080260083003 1 0 1 0
20606  90.24257284299176 1 0 1 0
20636    90.525246523322 1 0 1 0
20667  90.74899011808238 1 0 1 0
20698  90.96020572504692 1 0 1 0
20728   91.2978394905332 1 0 1 0
20759  91.57094808607303 1 0 1 0
20789  91.69606463454407 1 0 1 0
20820  91.67372538900372 1 0 1 0
20851  91.61201190437635 1 0 1 0
20879   91.6463265406987 1 0 1 0
20910  91.72012453841472 1 0 1 0
20940  91.81698124606866 1 0 1 0
20971  92.36824851634663 1 0 1 0
21001  92.96666666666665 1 0 1 0
21032   93.5608656691986 1 1 1 1
21063  93.56003127155508 1 1 1 1
21093  93.69282458403302 1 1 1 1
21124  93.84475304138094 1 1 1 1
21154  93.98199355085023 1 1 1 1
21185   94.4975896570113 1 1 1 1
21216  95.00015256111539 1 1 1 1
21244  95.46826570347174 1 1 1 1
21275  95.81386857409525 1 1 1 1
21305  96.01831730338625 1 1 1 1
21336  96.22726248390352 1 1 1 1
21366  96.47860150997995 1 1 1 1
21397   97.0447650955815 1 1 1 1
21428  99.14706056887239 1 1 1 1
21458  99.81284307560645 1 1 1 1
21489  99.94537553849875 1 1 1 1
21519  99.95736820823113 1 1 1 1
21550                100 1 1 1 1
21581  99.92617350252634 1 1 1 1
21609  99.87684127024507 1 1 1 1
21640  99.68502323638444 1 1 1 1
21670  99.59100980939722 1 1 1 1
21701   99.6550025047107 1 1 1 1
21731  99.83557100048154 1 1 1 1
21762 100.13801494400543 1 1 1 1
21793 100.43861257222825 1 1 1 1
21823 100.65436043028177 1 1 1 1
20301  92.43638256769361 2 0 0 0
20332  92.98250475973568 2 0 0 0
20362  93.42373884631418 2 0 0 0
20393  94.09016465566823 2 0 0 0
20423  94.44001015570463 2 0 0 0
20454  94.59719930264174 2 0 0 0
20485  94.74565944783787 2 0 0 0
20514  94.88154396721797 2 0 0 0
20545  95.04048639235303 2 0 0 0
20575  95.26228686697392 2 0 0 0
20606   95.4549061323093 2 0 0 0
20636  95.79086332327002 2 0 0 0
20667  96.12103891232213 2 0 0 0
20698  96.36645577438065 2 0 0 0
20728  96.76582490902696 2 0 0 0
20759  97.32565472472913 2 0 0 0
20789  97.65080021134789 2 0 0 0
20820  97.90908374228646 2 0 0 0
20851  98.06012386080364 2 0 0 0
20879  99.31372521531449 2 0 0 0
20910  99.41775685255435 2 0 0 0
20940  99.53832624275255 2 0 0 0
20971  99.73142372420256 2 0 0 0
21001 100.02222222222221 2 0 0 0
21032 100.21365728887892 2 1 0 0
21063 100.31791684827999 2 1 0 0
21093 100.40420378615771 2 1 0 0
21124 100.43306071831303 2 1 0 0
21154  100.4519459427096 2 1 0 0
21185 100.39132099835253 2 1 0 0
21216  100.3656121711602 2 1 0 0
21244 100.26929493459117 2 1 0 0
21275 100.26713355164017 2 1 0 0
21305 100.23713980291534 2 1 0 0
21336 100.22526662707654 2 1 0 0
21366 100.13904091759017 2 1 0 0
21397 100.06078688512889 2 1 0 0
21428 100.09379145037565 2 1 0 0
21458 100.04693468272814 2 1 0 0
21489  99.92590306751359 2 1 0 0
21519  99.91524721172544 2 1 0 0
21550                100 2 1 0 0
21581 100.04273444534414 2 1 0 0
21609  99.95537153079648 2 1 0 0
21640  99.78166876047861 2 1 0 0
21670  99.66283280968183 2 1 0 0
21701  99.47253403917104 2 1 0 0
21731  99.34262132447518 2 1 0 0
21762  99.18669066138146 2 1 0 0
21793  99.06374575956778 2 1 0 0
end
format %td date
label values region region
label def region 1 "pdr", modify
label def region 2 "pndr", modify

And the result it shows is:

Click image for larger version

Name: 1.png
Views: 1
Size: 14.0 KB
ID: 1524897

Do you think this is the right way?
and again with the third question, how can I look for the change of B₃ as time passes by?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

16 Nov 2019, 10:54

Yes, this is a good approach. I would use factor-variable notation rather than hand-calculating an interaction term, because I like to follow-these things up with the -margins- command (which requires factor-variable notation) so I can see the expected outcomes in each group in each time period easily.

Code:

regress pr i.time##i.treated, robust margins time#treated

By the way, you can get a really nice exploration of the parallel trends assumption, as well as the rather dramatic effect of the policy change in your data with

Code:

graph twoway line pr date, sort by(treated) xline(`=td(1aug2017)')

Now to demonstrate the effect changing over time, what you can do is use a discrete time variable, perhaps at the half year:

Code:

gen int hy = hofd(date) format hy %th regress pr i.hy##i.treated, robust margins hy, dydx(treated) marginsplot, xline(`=th(2017h2)')

Since these results seem to suggest that the trend in the difference in pr between the groups is roughly linear in both the pre- and post-policy periods, but with very different slopes, you might also capture this using a spline:

Code:

// CHANGE UNIT OF TIME TO MONTH FOR CONVENIENCE // AND BECAUSE VALUES OF DATE ARE ALL 1ST OF MONTH gen mdate = mofd(date) format date %tm mkspline pre `=tm(2017m8)' post = mdate format pre post %tm regress pr c.(pre post)##i.treated, robust margins treated, dydx(pre post) // RATES OF CHANGE OF pr PER MONTH

Do read -help fvvarlist- and -help mkspline- for more information. If you are unfamiliar with the -margins- command I suggest you begin by reading the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf.
2 likes
Comment
Jung Won You

Join Date: Nov 2019

Posts: 3
#5

17 Nov 2019, 05:00

Thank you so much, I really appreciate your help

I'll first read the materials you suggested and try the code.
Thanks.
Comment

Announcement

A few questions about STATA difference in difference using data from multiple time periods

Comment

Comment

Comment

Comment