Hi!
If anyone could help me with something, I would be very very grateful! =)
I have read around and I think I have understood how to do a difference -in-difference regression analysis in Stata. However I am not sure how I analyze many different yearly observations. Here is my case:
I have aboout 500 observations with 10 different treatments. I have split the data in clusters to do individual analysis' first, then I will attempt to do it on my entire dataset. I have 10 years of data, with calculated return on assets on the 500 companies. What I wish to analyze is the effect of direct airlines between clusters of companies. Therefore I have 10 treatment routs, and 51 untreated clusters. For example: If I have 3 areas, A, B, and C, and there is a direct route between A and B, but not between A and C. The companies in A with daughter companies in B are the treated companies, while the companies in A that has a daughter company in C is the control group.
I have the dummy-variable 1 if it’s treated, and 0 if it is not. I also have dummy-vaiables stating at which route it is treated.
I know that a normal DID-analysis looks like this:
gen treatment = route_1 == 1
gen after_reform = year > 2005
gen interaction = treatment*after_reform
regr ???? treatment after_reform interaction
My problem is now; how do I incorporate the 10 yearly ROAs? ROA03 ROA04 ROA05 etc etc. Because that is what I wish to analyze, the difference in the ROAs over the 10 years in the two groups, before and after a treatment.
A small part of my data set looks like this:
Here I wish to compare ROAs of observation 1 and 3 (route 1) with observation 4, 6 and 8, as they are active the year before and after 2008.
I’m pretty sure that if I wanted to just compare ROA09 I could write:
gen treatment = route_1 == 1
gen after_reform = 1 if ROA09 != .
gen interaction = treatment*after_reform
regr ROA09 treatment after_reform interaction
But how do I get all of the ROAs of the relevant years to compare in the regression?
Is there a problem if there is a different number of observations in control vs treated group? For example if the control groups goes bankrupt more often at the end of the dataset, there are more observations in the treated group than the control group.
After I have done this on all the 10 individual routes and control groups, I wish to do it on a national level. My problem here is that the companies are treated in different years, and therefore should be compared to control groups in those years. How do I do this? Or is it enough to have the dummy-variable stating that it is treated?
Sorry for the way too long post, I wanted to explain as much as possible so you understand my problem (hopefully) =)
Best regards,
Susanne Daae
If anyone could help me with something, I would be very very grateful! =)
I have read around and I think I have understood how to do a difference -in-difference regression analysis in Stata. However I am not sure how I analyze many different yearly observations. Here is my case:
I have aboout 500 observations with 10 different treatments. I have split the data in clusters to do individual analysis' first, then I will attempt to do it on my entire dataset. I have 10 years of data, with calculated return on assets on the 500 companies. What I wish to analyze is the effect of direct airlines between clusters of companies. Therefore I have 10 treatment routs, and 51 untreated clusters. For example: If I have 3 areas, A, B, and C, and there is a direct route between A and B, but not between A and C. The companies in A with daughter companies in B are the treated companies, while the companies in A that has a daughter company in C is the control group.
I have the dummy-variable 1 if it’s treated, and 0 if it is not. I also have dummy-vaiables stating at which route it is treated.
I know that a normal DID-analysis looks like this:
gen treatment = route_1 == 1
gen after_reform = year > 2005
gen interaction = treatment*after_reform
regr ???? treatment after_reform interaction
My problem is now; how do I incorporate the 10 yearly ROAs? ROA03 ROA04 ROA05 etc etc. Because that is what I wish to analyze, the difference in the ROAs over the 10 years in the two groups, before and after a treatment.
A small part of my data set looks like this:
treated | year_treated | route_1 | route_2 | ROA03 | ROA04 | ROA05 | ROA06 | ROA07 | ROA08 | ROA09 | ROA10 |
1 | 2008 | 1 | .1051213 | .1677096 | .0880811 | .1671309 | .0701199 | ||||
1 | 2008 | 1 | .1321278 | .1501537 | .1646822 | .0630993 | .032966 | -.1130877 | -.0459326 | -.0215173 | |
1 | 2008 | 1 | .4529801 | .2062659 | -.361311 | .4088252 | .2886758 | ||||
0 | 0 | .0160428 | -.0166116 | -.0071846 | .0055485 | .0318236 | .0069174 | .0270729 | .0299224 | ||
0 | 0 | .0879779 | .1574574 | ||||||||
0 | 0 | .1601925 | -.0806988 | -.039929 | .1219937 | -.1583 | -.0178174 | .2113627 | .0506658 | ||
0 | 0 | -.1377799 | -.3151261 | ||||||||
0 | 0 | -.0454545 | -.2137767 | -.6512969 | -1.080745 | .2529833 |
I’m pretty sure that if I wanted to just compare ROA09 I could write:
gen treatment = route_1 == 1
gen after_reform = 1 if ROA09 != .
gen interaction = treatment*after_reform
regr ROA09 treatment after_reform interaction
But how do I get all of the ROAs of the relevant years to compare in the regression?
Is there a problem if there is a different number of observations in control vs treated group? For example if the control groups goes bankrupt more often at the end of the dataset, there are more observations in the treated group than the control group.
After I have done this on all the 10 individual routes and control groups, I wish to do it on a national level. My problem here is that the companies are treated in different years, and therefore should be compared to control groups in those years. How do I do this? Or is it enough to have the dummy-variable stating that it is treated?
Sorry for the way too long post, I wanted to explain as much as possible so you understand my problem (hopefully) =)
Best regards,
Susanne Daae
Comment