Diff in Diff: DRDID and CSDID

FernandoRios

Join Date: Apr 2014

Posts: 2469
#226

23 Sep 2022, 07:50

Hi Mark
So, DRIMP and DRIPW use slighly different ways of using the IPW's
often, the strategy is
1- get the IPW
2- estimate model with IPW
THis is what DRIMP does. Mostly because is easy and simple

DRIPW does this different
1 - gets IPW
2 - Estimates model without IPW
3 - Corrects for unbalanced data using IPW and predicted outcomes.

In other words DRIPW introduces the correction as a 3rd step in the model.

This is just a different way to adjust estimates using IPW, since you could just as well do the same as DRIMP does. But, it just wasn't done like that on the model. (at some point i ll try to add that as well)

F
1 like
Comment
Mark Van Orden

Join Date: Sep 2022

Posts: 2
#227

23 Sep 2022, 12:23

Well explained - thank you for your help.
Comment
Antonio MartinsNeto

Join Date: Sep 2022

Posts: 3
#228

24 Sep 2022, 08:44

Originally posted by FernandoRios View Post

Hi Antonio
Sorry for the delay answering.
so, you do not need to drop the information before estimating the event effects within the "window". The only advantage of doing that would be to reduce the amount of estimations done on the background.
So, just use estat event, window()
F

Thanks a lot, Fernando. Very helpful!
Comment
Chunxiao Geng

Join Date: May 2022

Posts: 3
#229

26 Sep 2022, 02:17

Originally posted by FernandoRios View Post

Hi Chunxiao
1) I suspect that the extremely high ATT is localized on one or two time periods. Would you say that is the case (this would be observed in the RAW output). If that is happening, i wonder if the extremely high effect is due to overfitting (too many control variables vs observations), or lack of overlapping (some of the pscores are too close to 1 or 0). This could definitely create the problem you mention.
Thus, for every single treatted group, i would check if the overlapping assumption holds for every control variable.
2) so, if unconditional PTA holds, then Conditional should hold as well, i would say. but there is no way to be sure about it. I do think the problem you may be having is related to overfitting and lack of common support. Without this, you cannot rely on CPTA
3) The test you mention are all based on different assumptions.
estat pretrend tests if ANY of the pretrend tests is different from zero. So I would say is the most strict one, but also the most sensitive to small violations.
the pre-trend avg is also just a proxy suggesting if the average pre-ATT's are significant. But that can be misleading if you have, say, one positive and one negative ATT before treatment.
You could also run a -test- on all aggregated pretrend effects.

HTH
Fernando

Dear Fernando,

Thank you very much for your reply. It really helps me!

Chunxiao Geng
Comment
Letizia Ricchiardi

Join Date: Nov 2022

Posts: 1
#230

02 Nov 2022, 05:51

Dear Fernando,
thank you for all the work you are doing concerning the csdid command.

I would like to use the csdid command since I am trying to study gender quotas implementation in some European countries and see how these laws have an impact on banks environmental performances. In order to do so I have a panel data containing a sample of 73 listed banks, 34 belong to six different countries that adopted gender quota laws during the sample period that goes from 2010 through 2021.
Therefore, I have a TREATMENT group with all the banks belonging to countries in which a gender quota law has been implemented in the past and a CONTROL group with all the countries that never implemented gender quota law.
Since gender quotas have been implemented in different years, I have a staggered treatment and I am trying to run the command CSDID.
Code:

egen gvar_lawyr=csgvar(Post_1), tvar(Year) ivar(id)

Where Post_1 is a dummy variable equal to 1 from the exact year in which a country implemented a quota law.
gvar_lawyr indicates the exact year in which the law was implemented (i.e. gvar_lawyr=2011 for all Italian, French, Belgian banks, gvar_lawyr =2013 for all dutch banks; gvar_lawyr=2015 for all German banks and gvar_lawyr=2017 for all Austrian and Portuguese banks)
I tried to run the following command, but I get a lot of omitted variables and I do not understand how to solve this:

csdid ESG Board_size Women_Employees Board_ind Ceo_Chair capital_ratio ROA loans_assets deposits_assets loans_deposits tier1 non_perf_loans, ivar(id) time(Year) gvar(gvar_lawyr) method(drimp)

where ESG is my output variable and all the others are my independent and control variables.
Here is my Stata output:

Panel is not balanced
Will use observations with Pair balanced (observed at t0 and t1)
...........xxxxxxxxxxx..xx..xxx.xx..........
Difference-in-difference with Multiple Time Periods

Number of obs = 296
Outcome model : weighted least squares
Treatment model: inverse probability tilting
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g2011 |
t_2010_2011 | 4.237792 1.56448 2.71 0.007 1.171468 7.304117
t_2010_2012 | 20.99693 9.013555 2.33 0.020 3.330684 38.66317
t_2010_2013 | -1.008197 .4165226 -2.42 0.015 -1.824566 -.1918273
t_2010_2014 | -10.21802 2.632757 -3.88 0.000 -15.37813 -5.057914
t_2010_2015 | -9.82e-16 4.33e-16 -2.27 0.023 -1.83e-15 -1.33e-16
t_2010_2016 | -1.46e-14 6.52e-15 -2.23 0.025 -2.74e-14 -1.79e-15
t_2010_2017 | -14.58974 4.024969 -3.62 0.000 -22.47854 -6.700946
t_2010_2018 | 1.08e-28 5.15e-29 2.10 0.036 7.09e-30 2.09e-28
t_2010_2019 | 8.07e-15 4.35e-15 1.86 0.064 -4.53e-16 1.66e-14
t_2010_2020 | -.9296119 .2349435 -3.96 0.000 -1.390093 -.4691312
t_2010_2021 | -4.79e-21 2.28e-21 -2.10 0.036 -9.26e-21 -3.14e-22
-------------+----------------------------------------------------------------
g2013 |
t_2010_2011 | 0 (omitted)
t_2011_2012 | 0 (omitted)
t_2012_2013 | 0 (omitted)
t_2012_2014 | 0 (omitted)
t_2012_2015 | 0 (omitted)
t_2012_2016 | 0 (omitted)
t_2012_2017 | 0 (omitted)
t_2012_2018 | 0 (omitted)
t_2012_2019 | 0 (omitted)
t_2012_2020 | 0 (omitted)
t_2012_2021 | 0 (omitted)
-------------+----------------------------------------------------------------
g2015 |
t_2010_2011 | -9.84418 1.95914 -5.02 0.000 -13.68402 -6.004336
t_2011_2012 | -6.51e-28 3.41e-28 -1.91 0.056 -1.32e-27 1.65e-29
t_2012_2013 | 0 (omitted)
t_2013_2014 | 0 (omitted)
t_2014_2015 | 1.31e-14 8.63e-15 1.52 0.128 -3.77e-15 3.01e-14
t_2014_2016 | 8.67e-15 3.56e-15 2.44 0.015 1.70e-15 1.57e-14
t_2014_2017 | 0 (omitted)
t_2014_2018 | 0 (omitted)
t_2014_2019 | 0 (omitted)
t_2014_2020 | -5.25e-15 3.69e-15 -1.43 0.154 -1.25e-14 1.97e-15
t_2014_2021 | 0 (omitted)
-------------+----------------------------------------------------------------
g2017 |
t_2010_2011 | 0 (omitted)
t_2011_2012 | 1.21e-13 5.95e-12 0.02 0.984 -1.16e-11 1.18e-11
t_2012_2013 | 2.51e-15 4.33e-14 0.06 0.954 -8.23e-14 8.73e-14
t_2013_2014 | 1.42e-14 6.74e-15 2.11 0.035 9.99e-16 2.74e-14
t_2014_2015 | -7.79e-34 7.79e-33 -0.10 0.920 -1.60e-32 1.45e-32
t_2015_2016 | 3.30e-69 1.19e-69 2.76 0.006 9.60e-70 5.63e-69
t_2016_2017 | 3.95e-15 2.34e-11 0.00 1.000 -4.58e-11 4.58e-11
t_2016_2018 | 1.42e-14 6.15e-15 2.31 0.021 2.15e-15 2.63e-14
t_2016_2019 | 2.84e-38 1.36e-38 2.09 0.037 1.77e-39 5.50e-38
t_2016_2020 | 7.11e-15 3.41e-15 2.08 0.037 4.15e-16 1.38e-14
t_2016_2021 | 1.68e-33 8.08e-34 2.08 0.037 9.84e-35 3.27e-33
------------------------------------------------------------------------------
Control: Never Treated

Thank you so much for your help,
Letizia
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#231

02 Nov 2022, 10:28

Two options
a) try dripw since it’s a more stable estimator
b) show what you get if you tab
tab year gvar
thabk
Comment
Ayu Fitriani

Join Date: Nov 2022

Posts: 4
#232

09 Nov 2022, 09:44

Halo, I just starting to use the csdid command.
Can somebody help me.
I used this following command, but I do not get a result (the cofficents are all zero)
.csdid log_totasset remittance age age2 educ marstat male log_hprice hhsize rural homeownership agland nonagland, ivar(id) time(year) [notyet] gvar(first_treat) method(dripw)

Difference-in-difference with Multiple Time Periods

Number of obs = 0
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g2000 |
t_2000_2007 | 0 (omitted)
t_2000_2014 | 0 (omitted)
-------------+----------------------------------------------------------------
g2007 |
t_2000_2007 | 0 (omitted)
t_2000_2014 | 0 (omitted)
-------------+----------------------------------------------------------------
g2014 |
t_2000_2007 | 0 (omitted)
t_2007_2014 | 0 (omitted)
------------------------------------------------------------------------------
Control: Not yet Treated
Comment
Ayu Fitriani

Join Date: Nov 2022

Posts: 4
#233

09 Nov 2022, 09:51

Continues from the last post...
I want to estimate the total assets (depvar) from households that receive remittance, which happen in 3 periods (2000, 2007, and 2014). Some households receive/treated in a different years, sometime receive in 2000, but not receive in the following years, and vise versa.
My professor introduced me to Callaway and Sant'Anna paper and I try to run the csdid command.
Plase help me with what is wrong with my result. Thank you for the help.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#234

09 Nov 2022, 10:09

Hi Ayu
I think the problem may be related to how you are defining gvar (the cohort variable). If you are using panel data (and seems you are since you are declaring ivar()), then you assume that once a unit is treated, is always treated.
So if unit 1 has a gvar==2000, it should be 2000 in the year 2000, 2007 and 2014.
The other thing I see here is that you have 3 periods only. Which suggests you may do better applying DRDID instead. But more information about your design is needed
Fernando
1 like
Comment
Rattiya Lippe

Join Date: Sep 2020

Posts: 19
#235

10 Nov 2022, 02:29

Dear FernandoRios , thank you for developing this command. In my analysis, I plan to incoperate results from the PSM (used psmatch2) into DiD. From what I understand in the Stata community, we can estimate the DiD regression as normal, but weight the regression using the frequency weights generated from the psmatch2 command. I applied this when using csdid as follows:

csdid TreeCovLos, cluster(id) time(year) [fweight = _weight] gvar(first_treat) method(dripw) agg(event) notyet

But I observed that the results are not derived only from the matched sample and weights accordingly, but from the whole observations.

Could you suggest how I can incooperate PSM results into DiD? Or shall I manually drop unmatched observations before proceeding with the DiD estimation?

Thank you, Rattiya
Comment
Ayu Fitriani

Join Date: Nov 2022

Posts: 4
#236

12 Nov 2022, 17:40

Thank you the answers FernandoRios .
I tried to use DRDID, but the output says "You do not have a 2x2 design".
Also, I read your post that "2 values in time for the working sample. The earlier period will be used as pre, whereas the later period will be used as post".
So, I think I cannot use DRDID.

I have changed my gvar variable with your description, and I got this following result.
Is this right? I just got estimation for group 2007 only.

Difference-in-difference with Multiple Time Periods

Number of obs = 10,128
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g2000 |
t_2000_2007 | 0 (omitted)
t_2000_2014 | 0 (omitted)
-------------+----------------------------------------------------------------
g2007 |
t_2000_2007 | 1.161991 .0654373 17.76 0.000 1.033736 1.290246
t_2000_2014 | .5117145 .0811336 6.31 0.000 .3526956 .6707333
-------------+----------------------------------------------------------------
g2014 |
t_2000_2007 | 0 (omitted)
t_2007_2014 | 0 (omitted)
------------------------------------------------------------------------------
Control: Not yet Treated
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#237

15 Nov 2022, 09:13

Originally posted by Rattiya Lippe View Post

Dear FernandoRios , thank you for developing this command. In my analysis, I plan to incorporate results from the PSM (used psmatch2) into DiD. From what I understand in the Stata community, we can estimate the DiD regression as normal, but weight the regression using the frequency weights generated from the psmatch2 command. I applied this when using csdid as follows:

csdid TreeCovLos, cluster(id) time(year) [fweight = _weight] gvar(first_treat) method(dripw) agg(event) notyet

But I observed that the results are not derived only from the matched sample and weights accordingly, but from the whole observations.

Could you suggest how I can incooperate PSM results into DiD? Or shall I manually drop unmatched observations before proceeding with the DiD estimation?

Thank you, Rattiya

Two points there.
1) my command will not allow for fweights. At best it will treat it as pweights (but read as iweights)
2) If you are concerned about sample, you could "manually" exclude observations with zero weights.
your command would then need to be
csdid TreeCovLos [fweight = _weight] if _weight>0, cluster(id) time(year) gvar(first_treat) method(dripw) agg(event) notyet

HTH
1 like
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#238

15 Nov 2022, 09:15

Originally posted by Ayu Fitriani View Post

Thank you the answers FernandoRios .
I tried to use DRDID, but the output says "You do not have a 2x2 design".
Also, I read your post that "2 values in time for the working sample. The earlier period will be used as pre, whereas the later period will be used as post".
So, I think I cannot use DRDID.

I have changed my gvar variable with your description, and I got this following result.
Is this right? I just got estimation for group 2007 only.

Difference-in-difference with Multiple Time Periods

Number of obs = 10,128
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g2000 |
t_2000_2007 | 0 (omitted)
t_2000_2014 | 0 (omitted)
-------------+----------------------------------------------------------------
g2007 |
t_2000_2007 | 1.161991 .0654373 17.76 0.000 1.033736 1.290246
t_2000_2014 | .5117145 .0811336 6.31 0.000 .3526956 .6707333
-------------+----------------------------------------------------------------
g2014 |
t_2000_2007 | 0 (omitted)
t_2007_2014 | 0 (omitted)
------------------------------------------------------------------------------
Control: Not yet Treated

Not sure if its right or not.
But, if you have no "never treated group" and only 3 years of data, Then this is correct.
Comment
Rattiya Lippe

Join Date: Sep 2020

Posts: 19
#239

18 Nov 2022, 03:59

Originally posted by FernandoRios View Post

Two points there.
1) my command will not allow for fweights. At best it will treat it as pweights (but read as iweights)
2) If you are concerned about sample, you could "manually" exclude observations with zero weights.
your command would then need to be
csdid TreeCovLos [fweight = _weight] if _weight>0, cluster(id) time(year) gvar(first_treat) method(dripw) agg(event) notyet

HTH

Dear FernandoRios , Thank you for your reply. I executed the suggested command, it works well. So the observations those have missing _weight are dropped out automatically in the DiD estimation step.

I have one more question regarding the "pretend" test. The pretend tests in my case current significant which indicate that we cannot the null hypothesis. I understood that this imply that the pararell trend assumption is not hold. Do I understand correctly that the DiD approach is likely to be not suitable in my case?

Thank you and kind regards, Rattiya
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#240

18 Nov 2022, 08:54

Hi Rattiya
That is not necessary the case
You need to do a careful analysis of the "event" dynamics, and use that as additional evidence to say whether or not parallel trends hold
Also, you may need to control for other factors, or consider different function specifications (logs? )
HTH
F
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment