Hi everybody,
After spending a lot of time exploring the forum and other places online, I decided to make my first post here. There have been other topics here and elsewhere on sort like problems, yet I could not get to implement nor use the provided explanations in those myself - hence, this post. If there is something lacking in the manner I write my post, please let me know and I will change my behaviour accordingly in my future posts/replies.
The issue I am facing is the following. My goal is to study the treatment effect in the post-treatment period. Below I provided you with an example of my dataset (except other variables I dropped beforehand for the sake of posting it).
I have tried numerous methods multiple times but cannot find the way to the results I am looking for. My data analysis skills aren't the best as you notice, but with some advice and guidance, I hope to implement a better method of estimating a working model.
The variable year is the time variable I use whereas firmid is the panel variable I use. tau = 0 indicates the time of treatment for that specific event or the number of years it is apart from the event. Hence, if eventdate = year, tau = 0.
treated indicates all firms that are or will be subject to treatment by a value of 1, for firms that are not and will not be treated it is 0. Variable active_treated = 1 when treated = 1 and if year > eventdate ; since this implies the post-treatment period.
All different values of firmid either have only 1 treatment in the dataset (treated = 1) or they don't get treated at all (treated = 0).
The difficulty I face is how I can estimate a regression consisting of the dummy variable active_treated and some sort of post-treatment dummy variable. The active_treated dummy could normally be seen as the interactive dummy that is generated by treated * post_dummy, but in this case with multiple and different moments of treatment, this post-treatment dummy would not be consistent for the control group (treated = 0). The control group's observations would namely be prone to the eventdate of the different treatments - resulting in the post-treatment dummy having to fluctuate for the non-treated firms in the dataset..
I tried using the command tvdiff yesterday and considered the use of loops today instead, by looping a regression for every year an event has occurred and then somehow aggregating all of those results in some way. In which way, I do not know.
In the end, I'd want to obtain a range of t-2 to t+2 and see the effect a moment of treatment on average has.
Hopefully someone could help me out and suggest an approach I should consider using. After spending (too) much time myself by now, I lost track of the possible ways that I could make a sensible model. In case there is something missing in my explanation or problem or if I made an error in the manner of posting on this forum I am more than delighted to hear so as well.
Thanks a lot in advance for your help/tips/advice!
After spending a lot of time exploring the forum and other places online, I decided to make my first post here. There have been other topics here and elsewhere on sort like problems, yet I could not get to implement nor use the provided explanations in those myself - hence, this post. If there is something lacking in the manner I write my post, please let me know and I will change my behaviour accordingly in my future posts/replies.
The issue I am facing is the following. My goal is to study the treatment effect in the post-treatment period. Below I provided you with an example of my dataset (except other variables I dropped beforehand for the sake of posting it).
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(year outcome_var eventdate) int tau float(eventid treated) long firmid float active_treated 2016 66.94362 . . . 0 1 0 2015 97.75673 . . . 0 2 0 2016 72.11331 . . . 0 2 0 2017 39.52795 . . . 0 2 0 2018 23.114513 . . . 0 2 0 2019 7.986643 . . . 0 2 0 2020 13.991 . . . 0 2 0 2007 3.5584955 . . . 0 3 0 2008 5.740957 . . . 0 3 0 2007 138.97435 . . . 0 4 0 2009 134.36954 . . . 0 5 0 2010 29.01804 . . . 0 5 0 2011 28.438017 . . . 0 5 0 2012 16.668228 2014 -2 1 1 5 0 2013 14.306916 2014 -1 1 1 5 0 2014 8.754341 2014 0 1 1 5 0 2015 3.3422906 2014 1 1 1 5 1 2016 2.941602 2014 2 1 1 5 1 2017 23.094175 . . . 0 5 0 2018 16.317652 . . . 0 5 0 2019 43.08735 . . . 0 5 0 2012 5.684552 . . . 0 6 0 2013 5.691111 . . . 0 6 0 2014 4.784706 . . . 0 6 0 2015 3.765625 . . . 0 6 0 2016 2.2593336 . . . 0 6 0 2017 5.128872 . . . 0 6 0 2018 7.633845 . . . 0 6 0 2019 9.589101 . . . 0 6 0 2020 7.675261 . . . 0 6 0 2007 7.182402 . . . 0 7 0 2007 22.276 . . . 0 8 0 2008 12.48082 . . . 0 8 0 2009 11.706511 . . . 0 8 0 2007 91.02238 . . . 0 9 0 2007 0 . . . 0 10 0 2008 0 . . . 0 10 0 2009 0 . . . 0 10 0 2010 0 . . . 0 10 0 2011 0 . . . 0 10 0 2007 7.167628 . . . 0 11 0 2008 6.927828 . . . 0 11 0 2009 7.920464 . . . 0 11 0 2010 8.580811 . . . 0 11 0 2011 8.333334 . . . 0 11 0 2012 7.820124 . . . 0 11 0 2013 7.298481 . . . 0 11 0 2014 7.940303 . . . 0 11 0 2015 8.059015 . . . 0 11 0 2016 8.400145 . . . 0 11 0 2017 8.711821 . . . 0 11 0 2018 9.534941 . . . 0 11 0 2007 9.669005 . . . 0 12 0 2008 9.435482 . . . 0 12 0 2009 9.4710245 . . . 0 12 0 2010 11.481378 . . . 0 12 0 2011 12.359738 . . . 0 12 0 2012 11.5619 . . . 0 12 0 2013 6.645917 . . . 0 12 0 2014 6.642959 . . . 0 12 0 2015 6.885567 2017 -2 4 1 12 0 2016 6.819162 2017 -1 4 1 12 0 2017 8.159912 2017 0 4 1 12 0 2018 7.521748 2017 1 4 1 12 1 2019 7.647943 2017 2 4 1 12 1 2020 6.992603 . . . 0 12 0 2010 17.955278 . . . 0 13 0 2011 18.86044 . . . 0 13 0 2012 16.681175 . . . 0 13 0 2013 16.993082 2015 -2 5 1 13 0 2014 18.281563 2015 -1 5 1 13 0 2015 19.40155 2015 0 5 1 13 0 2016 17.809502 2015 1 5 1 13 1 2017 18.815565 2015 2 5 1 13 1 2018 32.83058 . . . 0 13 0 2019 20.417244 . . . 0 13 0 2020 16.938232 . . . 0 13 0 2018 65.71169 . . . 0 14 0 2019 87.09094 . . . 0 14 0 2020 12.606636 . . . 0 14 0 2012 45.64033 . . . 0 15 0 2013 43.29089 . . . 0 15 0 2014 36 . . . 0 15 0 2007 45.25854 . . . 0 16 0 2008 41.71629 . . . 0 16 0 2009 33.942085 . . . 0 16 0 2010 29.673445 . . . 0 16 0 2011 25.44216 . . . 0 16 0 2012 20.34817 . . . 0 16 0 2013 16.21955 . . . 0 16 0 2014 16.72103 . . . 0 16 0 2015 15.619315 . . . 0 16 0 2016 15.099396 . . . 0 16 0 2017 14.908017 . . . 0 16 0 2018 12.681622 . . . 0 16 0 2019 12.15221 . . . 0 16 0 2020 11.744678 . . . 0 16 0 2021 14.380157 . . . 0 16 0 2007 26.58697 . . . 0 17 0 2008 34.0246 . . . 0 17 0 end format %ty year format %ty eventdate label values firmid firmid label def firmid 1 "000850", modify label def firmid 2 "000899", modify label def firmid 3 "001058", modify label def firmid 4 "001630", modify label def firmid 5 "00163U", modify label def firmid 6 "00182C", modify label def firmid 7 "00202H", modify label def firmid 8 "002083", modify label def firmid 9 "00211Y", modify label def firmid 10 "002564", modify label def firmid 11 "002567", modify label def firmid 12 "002824", modify label def firmid 13 "00287Y", modify label def firmid 14 "00288U", modify label def firmid 15 "00289Y", modify label def firmid 16 "003654", modify label def firmid 17 "00383Y", modify
I have tried numerous methods multiple times but cannot find the way to the results I am looking for. My data analysis skills aren't the best as you notice, but with some advice and guidance, I hope to implement a better method of estimating a working model.
The variable year is the time variable I use whereas firmid is the panel variable I use. tau = 0 indicates the time of treatment for that specific event or the number of years it is apart from the event. Hence, if eventdate = year, tau = 0.
treated indicates all firms that are or will be subject to treatment by a value of 1, for firms that are not and will not be treated it is 0. Variable active_treated = 1 when treated = 1 and if year > eventdate ; since this implies the post-treatment period.
All different values of firmid either have only 1 treatment in the dataset (treated = 1) or they don't get treated at all (treated = 0).
The difficulty I face is how I can estimate a regression consisting of the dummy variable active_treated and some sort of post-treatment dummy variable. The active_treated dummy could normally be seen as the interactive dummy that is generated by treated * post_dummy, but in this case with multiple and different moments of treatment, this post-treatment dummy would not be consistent for the control group (treated = 0). The control group's observations would namely be prone to the eventdate of the different treatments - resulting in the post-treatment dummy having to fluctuate for the non-treated firms in the dataset..
I tried using the command tvdiff yesterday and considered the use of loops today instead, by looping a regression for every year an event has occurred and then somehow aggregating all of those results in some way. In which way, I do not know.
In the end, I'd want to obtain a range of t-2 to t+2 and see the effect a moment of treatment on average has.
Hopefully someone could help me out and suggest an approach I should consider using. After spending (too) much time myself by now, I lost track of the possible ways that I could make a sensible model. In case there is something missing in my explanation or problem or if I made an error in the manner of posting on this forum I am more than delighted to hear so as well.
Thanks a lot in advance for your help/tips/advice!
Comment