difference-in-difference

hussein bataineh

Join Date: Apr 2023
Posts: 159

difference-in-difference

31 Jul 2024, 12:04

Hi,

This is my first time running a difference-in-difference analysis, and I'm not sure if I did it correctly. Here are the details:

Independent Variable:

csopresence1: An indicator variable that takes a value of 1 if the firm has a chief sustainability officer, and 0 otherwise.

Dependent Variable:

An indicator variable that takes a value of 1 if a firm freezes the defined benefit plan for all employees, and 0 otherwise.

I am unsure why the results only appear for the independent variable, while the control variables do not show up.

Also, could you confirm if I applied the difference-in-difference method correctly?

Thank you!

xi: xtdidregress (hard_final_Exact_new Firm_Size_w ROA_w Leverage_w Market_book_four_w Non_pension_CFO_w STD_CFO_w Board_Independence_w BoardSize_w Gender_Diversity_w Fund_Status_w FUNDING_RATIO_w Platn_Size_w CSR_Committee SustainabilityScore_w i.year i.ff_12)(csopresence1 ), group(id) time(year)xi: xtdidregress (hard_final_Exact_new Firm_Size_w ROA_w Leverage_w Market_book_four_w Non_pension_CFO_w STD_CFO_w Board_Independence_w BoardSize_w Gender_Diversity_w Fund_Status_w FUNDING_RATIO_w Platn_Size_w CSR_Committee SustainabilityScore_w i.year i.ff_12)(csopresence1 ), group(id) time(year)

HTML Code:

. xi: xtdidregress (hard_final_Exact_new Firm_Size_w   ROA_w     Leverage_w   Market_book_four_w   Non_pension_CFO_w   STD_CFO_w 
>  Board_Independence_w BoardSize_w Gender_Diversity_w  Fund_Status_w  FUNDING_RATIO_w  Platn_Size_w     CSR_Committee  Sustainab
> ilityScore_w   i.year    i.ff_12)(csopresence1 ), group(id) time(year)
i.year            _Iyear_2004-2022    (naturally coded; _Iyear_2004 omitted)
i.ff_12           _Iff_12_1-12        (naturally coded; _Iff_12_1 omitted)
note: _Iff_12_2 omitted because of collinearity.
note: _Iff_12_3 omitted because of collinearity.
note: _Iff_12_4 omitted because of collinearity.
note: _Iff_12_5 omitted because of collinearity.
note: _Iff_12_6 omitted because of collinearity.
note: _Iff_12_7 omitted because of collinearity.
note: _Iff_12_8 omitted because of collinearity.
note: _Iff_12_9 omitted because of collinearity.
note: _Iff_12_10 omitted because of collinearity.
note: _Iff_12_11 omitted because of collinearity.
note: _Iff_12_12 omitted because of collinearity.
note: 2007.year omitted because of collinearity.
note: 2008.year omitted because of collinearity.
note: 2009.year omitted because of collinearity.
note: 2010.year omitted because of collinearity.
note: 2011.year omitted because of collinearity.
note: 2012.year omitted because of collinearity.
note: 2013.year omitted because of collinearity.
note: 2014.year omitted because of collinearity.
note: 2015.year omitted because of collinearity.
note: 2016.year omitted because of collinearity.
note: 2017.year omitted because of collinearity.
note: 2018.year omitted because of collinearity.
note: 2019.year omitted because of collinearity.
note: 2020.year omitted because of collinearity.
note: 2021.year omitted because of collinearity.
note: 2022.year omitted because of collinearity.

Treatment and time information

Time variable: year
Control:       csopresence1 = 0
Treatment:     csopresence1 = 1
-----------------------------------
             |   Control  Treatment
-------------+---------------------
Group        |
          id |        88        182
-------------+---------------------
Time         |
     Minimum |      2004       2004
     Maximum |      2021       2022
-----------------------------------

Difference-in-differences regression                     Number of obs = 3,167
Data type: Longitudinal

---------------------------------------------------------------------------------------
 hard_final_Exact_new | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
----------------------+----------------------------------------------------------------
ATET                  |
         csopresence1 |
            (1 vs 0)  |    .024631   .0093159     2.64   0.008     .0063721    .0428898
---------------------------------------------------------------------------------------
Note: ATET estimate adjusted for covariates, panel effects, and time effects.
Note: Treatment occurs at different times and estimation sample contains units that switch in and out of treatment

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3152
#2

31 Jul 2024, 12:18

This is an issue: Note: Treatment occurs at different times and estimation sample contains units that switch in and out of treatment
Comment
hussein bataineh

Join Date: Apr 2023

Posts: 159
#3

31 Jul 2024, 12:29

please professor George what is your suggestion should i use different method or how can solve thisproblem pleases ??
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#4

31 Jul 2024, 14:37

I don't think this is a simple DID problem. Do you have two groups, neither treated in period 0, and then 1 group treated in period 1?

Here, the treatment comes and goes, arrives at different times.

And, I'd think the defined benefits issue is a somewhat permanent decision, so the on/off of the treatment after the first time is irrelevant.

Without more insight into the data, it's hard to say, but I think this is a much harder problem than you're model suggests.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#5

31 Jul 2024, 14:59

In my masters thesis, I used the did_multiplegt command (I think it's on SSC). It, technically, can account for treatments coming and going, but I tend to agree with George here in saying that this is a kind of permanent decision units they got rid of the position for some reason.

By the way, you don't need to use the xi: syntax unless your Stata is from 2007, and you also don't need to use the year indicators since xtdidregress does this by default
Comment

hussein bataineh

Join Date: Apr 2023
Posts: 159

31 Jul 2024, 15:35

actually im not very familiar with DID
i tried to run did_multiplegt by using SSC but unfortunately it doe not work

HTML Code:

 . did_multiplegt dyn (hard_final_Exact_new Firm_Size_w ROA_w Leverage_w Market_book_four_w Non_pension_CFO_w STD_CFO_w Board_Independence_w BoardSize_w Gender_Diversity_w Fund_Status
> _w FUNDING_RATIO_w Platn_Size_w CSR_Committee SustainabilityScore_w i.ff_12), group(id) time(year) treatment(csopresence1) cluster(id)
Invalid syntax.

did_multiplegt is now a library and it follows a new syntax:
       did_multiplegt (mode) varlist [, options]

Depending on the mode argument, did_multiplegt can be used to call
     - did_multiplegt_dyn with the dyn mode;
     - did_multiplegt_stat with the stat mode;
     - did_had with the had mode;
     - did_multiplegt_old, i.e. the older version of this command, with the old mode;

Comment

George Ford

Join Date: Aug 2014

Posts: 3152
#7

31 Jul 2024, 16:00

Hard to get your head around this one.

This is somewhat like a survival problem -- time until a defined benefit plan is frozen. That time depends (maybe) on the presence of a certain actor.

does freeze ever reverse?

Are there units where csopresence1 is always 0? I suspect so.

are freezes clustered in time at all?

are csopresence1==1 clustered in time?

while it may not be DID formally, xtdidregress (which is just xtreg) is perhaps measuring what you want it to measure. It just tells you whether freeze is more likely when csopresence1==1.

You'd think there may be industry effects too (spatial regression).
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#8

31 Jul 2024, 16:02

Curious as to the theory of the relationship.
Comment
hussein bataineh

Join Date: Apr 2023

Posts: 159
#9

31 Jul 2024, 17:17

Thanks prof for your support and efforts

Regarding your question about whether a freeze can ever be reversed: No, it cannot. A freeze occurs only once and is coded as 1 for a specific year within the entire lifespan of the firm. In other words, it happens only once in the firm's life. Consequently, there can be many years coded as 0, but there will be just one year coded as 1.

As for the units where csopresence1 is always 0, I suspect this is the case for some units. Indeed, there are instances where a firm does not have a Chief Sustainability Officer, so we code 0 for all years for those firms.

Dependent Variable: hard_final_Exact_new

The dependent variable, hard_final_Exact_new, is coded as 0 for many years until a specific event occurs, which is then coded as 1 for that particular year. The value of 1 is coded only once when the freeze happens for a particular firm. For example, if a firm freezes its pension plan in 2015, within a sample period spanning from 2004 to 2022, the variable would be coded as 0 from 2004 to 2014, 1 in 2015, and then the years from 2016 to 2022 would be excluded from the regression analysis as they lack relevant data.

Independent Variable: csopresence1

The independent variable, csopresence1, is typically coded as 0 until a Chief Sustainability Officer (CSO) is appointed, after which it is coded as 1. For instance, if a CSO is appointed in 2016, the variable would be coded as 0 from 2004 to 2015 and as 1 from 2016 to 2022

Last edited by hussein bataineh; 31 Jul 2024, 17:24.
Comment

George Ford

Join Date: Aug 2014
Posts: 3152

#10

01 Aug 2024, 07:23

This is a survival (time to event) model. Cox regression is appropriate, I think, but I've not used data like this.

HTML Code:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4848953

HTML Code:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7319228/

HTML Code:

https://www.statalist.org/forums/forum/general-stata-discussion/general/1306203-differences-in-differences-cox-regression

HTML Code:

https://as.nyu.edu/content/dam/nyu-as/populationCenter/documents/dd210429.d.pdf

Comment

hussein bataineh

Join Date: Apr 2023

Posts: 159
#11

01 Aug 2024, 08:56

Please

I have a question: What do you think are the potential issues when applying the difference-in-differences (DiD) method? is because of the type of data i have

Also, I'm unsure about whether I can apply a staggered difference-in-differences approach. If it's possible, could you guide me on how to implement it?
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#12

01 Aug 2024, 10:02

for the controls, you have zeros for the treatment, with some having a 1 for the outcome (series ends) and others not (it could come later, but unobserved).

for the treated, you have the treatment switching on/off and possibly a 1 outcome (series ends) or a 0 outcome (it could come later but unobserved)..

This is a very odd situation that, to me, is not well suited for a 2x2 DID.

The data looks like a survival model.

Maybe staggered would work, if you can find one that allows the treatment to switch on/off.
Comment
hussein bataineh

Join Date: Apr 2023

Posts: 159
#13

01 Aug 2024, 15:07

Thanks professor Geroge for your efforts
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#14

01 Aug 2024, 20:05

Hussein: It looks to me like you have a staggered rollout without exit. You can use xthdidregress with either the twfe or ra option. This command is new to Stata 18. Or, use jwdid (same as xthdidregress twfe) or csdid, method(reg) (which is the same as xthdidregress ra). These have to be installed from ssc. For jwdid and csdid, you need to define a variable that is the first treatment date. You seem to know these dates from the variable CSO. Call this variable "first_treat." Then, you need to define this as the first year that a unit is treated. first_treat = 0 for a never treated unit.

Then you can use the following:

Code:

jwdid y x1 ... xK, ivar(id) tvar(year) gvar(first_treat) estat event estat plot estat simple csdid y x1 ... xK, ivar(id) time(year) gvar(first_treat) method(reg) long2 estat event csdid_plot
Comment

hussein bataineh

Join Date: Apr 2023
Posts: 159

#15

03 Aug 2024, 13:07

Thank you, Professor, for your suggestion. I tried your approach multiple times, but unfortunately, it didn't work.

Instead, I used a similar code but removed i.year and i.id from the regression, which eliminated the omitted variable issue.

My questions are:

Is this estimation correct?
Why the regression does not display the control variables?

HTML Code:

 . xtdidregress (hard_final_Exact_new Firm_Size_w ROA_w Leverage_w Market_book_four_w Non_pension_CFO_w STD_CFO_w Board_Independen
> ce_w BoardSize_w Gender_Diversity_w Fund_Status_w FUNDING_RATIO_w Platn_Size_w CSR_Committee SustainabilityScore_w)(csopresence
> 1 ) if q_roa>4, group(id) time(year)

Treatment and time information

Time variable: year
Control:       csopresence1 = 0
Treatment:     csopresence1 = 1
-----------------------------------
             |   Control  Treatment
-------------+---------------------
Group        |
          id |        25         67
-------------+---------------------
Time         |
     Minimum |      2004       2004
     Maximum |      2020       2022
-----------------------------------

Difference-in-differences regression                       Number of obs = 504
Data type: Longitudinal

                                             (Std. err. adjusted for 92 clusters in id)
---------------------------------------------------------------------------------------
                      |               Robust
 hard_final_Exact_new | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
----------------------+----------------------------------------------------------------
ATET                  |
         csopresence1 |
            (1 vs 0)  |  -.0223728    .012014    -1.86   0.066    -.0462372    .0014915
---------------------------------------------------------------------------------------
Note: ATET estimate adjusted for covariates, panel effects, and time effects.
Note: Treatment occurs at different times and estimation sample contains units that switch in and out of treatment.

Announcement

difference-in-difference

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment