Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • If you do not see them before they are treated then you cannot use it as panel. Precisely because there is no g-1 data.
    attgts will not be identified

    Comment


    • Hi FernandoRios ,

      Sorry to insist on my data issue (posts #178-181) but I seem to be stuck.

      The actual data I am working with is an unbalanced panel, which always includes the year of treatment.

      I do observe the majority of the units before they are treated. However, I may not be able to observe all units in the exact year before treatment. That is, I observe the majority of the units in pre-treatment years. Just not necessarily in year -1.

      Is it possible to identify the ATTGTs in this setting?

      Thank you in advance.

      Comment


      • Hi Noriko
        So, if you want to use panel estimators, you cannot.
        The reason is that ATTGT is not identified
        Remember for post treatment, ATTGT is defined as: E(y|t, Treated at G)-E(y| g-1 , Treated at G) - D
        So if you do not have g-1, this parameter is not identified.

        If you want to use crossection estimators, then you may be able to.

        At the end of the day, it really matters if you have enough controls and treated observations
        Last edited by FernandoRios; 27 Jul 2022, 10:41.

        Comment


        • Dear Fernando,

          Thank you so much for you very useful replies!

          I have a question regarding how to interpret the number of observations (N) when using covariates, in particular when using method(drimp) and method(ipw).
          Is there a way to know which observations are being used for the estimation?

          Thank you so much,
          Lucia

          Comment


          • Hi Lucia
            a couple of suggestions.
            1. instead or drimp, use dripw. Drimp is generating some weird results because is hard to estimate. And may be dropping more information that you want.
            2. the used observations will be accessed using e(sample). However, this may give you problem if your data is not well setup (you have too many gaps for example).
            3. there is nothing to interpret. Just to be aware off. For example, after generating a variable with the sample used (gen smp=e(sample)) you could tab year first_treat if smp==1 to check how many observations you have year and cohort. that will give you an idea of the degrees of freedom you had on each iteration.

            F

            Comment


            • Hi Fernando,

              When I used CSDID, I sometimes obtained different estimated coefficients when I rerun the regressions. Could you please advise me if it could happen? If not, then I definitely made some mistakes in data preparation steps.

              Thank you so much!

              Best,
              Mingyu

              Comment


              • The usual reason is that you may be overfitting your model.
                Meaning, that you are adding more controls that observations available.
                Now, this idea of observations is different from total sample. If you tabulate
                tab year gvar
                What you observe as the smallest number (but larger than zero) is the effective number of observations, and you cannot have more controls than observations.
                HTH

                Comment


                • Hi Fernando,

                  Thank you so much for your reply and your advice is extremely helpful! I did find some cohort-timing blocks have more controls than observations. I guess this explains why I had relatively stable results in main analysis (using full sample), but less stable results in stratification analyses (using small sub-samples)? In cases with more controls than observations, would it be helpful if I set "rseed"? In addition, I also observed some changes with estimated results when I ran CSDID on full sample (without cohort-timing blocks that have more controls than observations, although some still have relatively small sample size). Could that also be caused by overfitting?

                  Thanks again and I look forward to your reply.

                  Best,
                  Mingyu

                  Comment


                  • No, rseed will not help. That only helps to get reproducible effects regarding bootstrapping.
                    the problem with overfitting is that some times some variables may be dropped, compared to others. And there is no way i cansee how to control for that.
                    You simply need to use a min set of controls.

                    For the last problem you mention, if its not a problem of overfitting, perhaps its a problem of violation of overlapping assumption. namely that you have perfect predictios as controls
                    HTH



                    Comment


                    • Hi Fernando,

                      Thank you so much! I will see if I can get rid of a few controls. In addition, is it also workable if I can combine a few treatment timing cohorts?

                      Best,
                      Mingyu

                      Comment


                      • You can do that but your interpretation may change
                        yoi could say, for example , that units treated in 2010 and 2011 were indeed treated in 2010

                        Comment


                        • Thank you so much, Fernando! I will think more about how to deal with this issue.

                          Comment


                          • Dear FernandoRios .
                            I have unbalanced panel data from 1990-2012. However, my first treatment cohort is g1994 and the last treatment cohort is g2005 (all units get treated by 2005). Also, I do not have never-treated units in the sample. I use not-yet treated as comparison. I assume parallel trends hold unconditionally. I have two questions.
                            When I run the csdid command -

                            Code:
                            csdid ln(y) , ivar(id) time(year) gvar(first_treat) notyet method(dripw) saverif(A1)
                            1) It calculates ATT(g,t) for all the groups (perfect), but the ATT's for every group from 2005 onwards is omitted. For example if we have g2000, Then t_1999_2002 means ATT for g2000 two years after the treatment using g-1 = 1999 as a base period for this group. The comparison units for this group at t=2002 are g2003, g2004, g2005 (groups not yet treated by t=2002). Similarly, to calculate ATT in t_1999_2004 for the same group, the comparison group at t=2004 is g2005 (groups not yet treated by t=2005). Once we wish to calculate ATT at t_1999_2005 for the same group, we do not have any comparison group at t=2005. Because all units got eventually treated by 2005 (g2005 is last treated group), So this ATT combination is OMITTED , similarly t_1999_2005...............t_1999_2012 etc. are all omitted. If on the other hand we have fixed sample of never-treated units, we can possibly be able to calculate all the ATT's for each time period.
                            Dear FernandoRios you have answered the above question in previous query and confirmed that my understanding is correct of why ATT's after 2005 are OMITTED in this case of data from 1990-2012.
                            (Because g2005 is last treated group and all groups are eventually treated by 2005, there esists no comparison after that, therefore ATT(g,t)'s from 2005 onwards to 2012 are OMITTED)

                            Since, our last treatment cohort is g2005 and "csdid" estimator does not use the data post-2006, so I dropped the years from 2006-2012 and redo the "csdid" estimation. The results do not change, because "csdid" was not using the data on those observations either (I hope I am correct). I apply de'chaismartin and D'Haultfouille DIDM estimator also to the new data, again the results do not change in what they were earlier in full data (before dropping anything). This again makes sense because in my case DIDM is ATE of joiners only, which compares the treated groups (whose treatment change from t-1 = 0 to t =1 ) with the comparison group (stable group) whose treatment remains the same in both periods from t-1 = t =1 = 0. Like in callaway and sant'anna context here DIDM uses not-yet treated as comparison and therefore is not using the data post-2005 either, so results do not change whether you drop years 2006-2012 or retain them. Similarily, the results based on Borusyak et al did_imputation also do not change when we drop years from 2006-2012.

                            However, the results based on TWFE estimator change, the results are different when we have all the data from 1990-2012 but we get different even-study effects when we have data only from 1990-2005. I am not exactly getting why this is the case. Please provide some insights in this regard.

                            Thanks,
                            (Ridwan)

                            Comment


                            • When you say TWFE, do you mean the one similar to Wooldridge approach?
                              or instead the one with a single treated dummy
                              reg y treated i.cohorts i.year?

                              Comment


                              • No, not the one based on Wooldridge. I mean the dynamic TWFE OLS estimator traditionally used.
                                Code:
                                reghdfe Y F*event L*event, a(i t) cluster(i)
                                where F and L are event leads and lags respectively.

                                Here the results change, when we have all the data from 1990-2012 but we get different even-study effects when we have data only from 1990-2005, irrespective of the fact that our last treatment cohort is g2005 and we do not have never-treated group in the sample. I hope, I made this clear.

                                Comment

                                Working...
                                X