Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by FernandoRios View Post
    HI there,
    1) the pretrend test is just a joint test that all "pre-treatment" effects are zero. In your case, you cannot reject the hypothesis
    2) GAverage is, as the name suggest, the average ATT across all cohorts, weighted by their total size.
    3) You cannot identify the exact number of observations by cohort, unless you have a perfect balance panel. and are using never treated as control.
    Remember that each ATTGT (the results from the LOOONG table) is estimated separately, so it has its own sample. Those samples cannot be added up.
    F
    Thank you very much FernandoRios .. Happy New Year !

    My apologies that I am still not clear on the first point.

    The pre-treatment effects are zero and cannot be rejected in the pre-trend test. If the pre-treatment effects are zero, would it mean that the changes after treatment are not just a continuation of the trend but are due to the intervention?

    Kind regards,
    Rattiya

    Comment


    • Yes that is the basic idea
      that before intervention treated and control groups had parallel trends on their out outcomes. But after, the treated group differed due to treatment

      Comment


      • Dear Fernando,

        I am having an issue with csdid command. I am using csdid in Stata 14.0.
        I want to evaluate the effect of a treatment over 3 different periods. My database is cross-sectional. It contains a time variable - survey years are 2000, 2005, 2010 and 2014 (var name "survey_year") - and a treatment variable - treatment matches my time variable with treatment in 2000, 2005, 2010 and 2014 (variable name "treatment_DHSyear"). I also have controls.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        
        clear
        
        input int(survey_year treatment_DHSyear) float year_province double(WB_financial_value hh_size age_head rural educ_head relationship_head sex)
        
        2000 2010 1 31721105.716789998 6 35 0 7  2 2
        
        2000 2010 1 31721105.716789998 6 35 0 7 10 1
        
        2000 2010 1 31721105.716789998 5 60 0 0  4 2
        
        2000 2010 1 31721105.716789998 3 49 0 0  3 2
        
        2000 2010 1 31721105.716789998 2 51 0 0  3 2
        
        end
        
        label values sex sex
        
        label def sex 1 "male", modify
        
        label def sex 2 "female", modify
        When I cross-tabulate my time and group variable, I obtain what I think is a correct design:

        Code:
        tab survey_year treatment_DHSyear
        
        survey_year |                   treatment_DHSyear
        
                      |         0       2000       2005       2010       2015 |     Total
        
        -----------+-------------------------------------------------------+----------
        
              2000 |     6,262      3,429      5,076     12,974      4,128 |    31,869
        
              2005 |     6,209      3,133      5,595     14,271      3,248 |    32,456
        
              2010 |     6,217      3,597      6,279     12,428      3,053 |    31,574
        
              2014 |     4,849      3,503      5,529     11,349      3,190 |    28,420
        
        -----------+-------------------------------------------------------+----------
        
             Total |    23,537     13,662     22,479     51,022     13,619 |   124,319
        My issue is the following :
        • When I estimate the effect using csdid, all the observations are omitted, the time periods are not correct (2003.5 instead of 2005) and I cannot retrieve an effect. I am wondering if this is because of my controls because when I change one of the control I manage to obtain estimates (see following bullet point) ; I have also seen on the forum that sometimes this is because there is not enough variation based on gvar and time.
        Code:
        global CSDIDdemogC "hh_size age_head i.rural educ_head relationship_head"
        
        csdid years_education $CSDIDdemogC sex [iw=wt] , time(survey_year) gvar(treatment_DHSyear) model(dripw) notyet cluster(year_province)
         
        
                                                        Number of obs     =          0
        
        Outcome model  : least squares
        
        Treatment model: inverse probability
        
        ---------------------------------------------------------------------------------
        
                        |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        
        ----------------+----------------------------------------------------------------
        
        g2005           |
        
          t_2000_2003.5 |          0  (omitted)
        
          t_2003.5_2007 |          0  (omitted)
        
        t_2003.5_2010.5 |          0  (omitted)
        
          t_2003.5_2014 |          0  (omitted)
        
        ----------------+----------------------------------------------------------------
        
        g2010           |
        
          t_2000_2003.5 |          0  (omitted)
        
          t_2003.5_2007 |          0  (omitted)
        
          t_2007_2010.5 |          0  (omitted)
        
            t_2007_2014 |          0  (omitted)
        
        ----------------+----------------------------------------------------------------
        
        g2015           |
        
          t_2000_2003.5 |          0  (omitted)
        
          t_2003.5_2007 |          0  (omitted)
        
          t_2007_2010.5 |          0  (omitted)
        
          t_2010.5_2014 |          0  (omitted)
        
        ---------------------------------------------------------------------------------
        • I replace just one variable ("WB_financial_value" replaces "sex") and this time, results are estimated (my last time-period 2014 is not used to compute the effect because there are no available data for this period).
        Code:
        csdid years_education WB_financial_value $CSDIDdemogC [iw=wt] , time(survey_year) gvar(treatment_DHSyear) model(dripw) notyet cluster(year_province)
        
        
                                                        Number of obs     =     85,061
        
                                 (Std. Err. adjusted for 57 clusters in year_province)
        
        ------------------------------------------------------------------------------
        
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        
        -------------+----------------------------------------------------------------
        
        g2005        |
        
         t_2000_2005 |  -.1171093   .3519029    -0.33   0.739    -.8068264    .5726077
        
         t_2000_2010 |   .0733148   .3532972     0.21   0.836    -.6191349    .7657645
        
        -------------+----------------------------------------------------------------
        
        g2010        |
        
         t_2000_2005 |  -.3538494   .2600536    -1.36   0.174     -.863545    .1558463
        
         t_2005_2010 |   .6625479   1.159822     0.57   0.568    -1.610662    2.935758
        
        -------------+----------------------------------------------------------------
        
        g2015        |
        
         t_2000_2005 |    .031048   .4907112     0.06   0.950    -.9307283    .9928242
        
         t_2005_2010 |   .0401882   1.530982     0.03   0.979    -2.960481    3.040857
        
        ------------------------------------------------------------------------------

        I would be very grateful if you could help me understand these results and indicate what I can do to try to fix it.
        I hope my question was clear,

        Sincerely,

        Mathilde Perrot






        Comment


        • Hi Mathilde
          I think the problem has to do with the year of the data.
          So, is "sex" available across years, but WB_financial_value is not available in 2014?
          If that is true, that is why the program cannot estimate anything in your first run.

          Recall that CSDID compares data in period T with G-1 to estimate ATTGTs (post treatment). Normally you have yearly data, so there is no problem.
          Now, you have Data that spans every 4 or 5 years. Should ATTGTs be estimated using T and G-4 or G-5 ? It cannot use two different "numbers here, which is why the first syntax you showed has a problem.

          Solution. Change survey year from 2014 to 2015. And re run the model.
          HTH
          Fernando

          Comment


          • Hi Fernando,
            Thank you very much for your answer, you were totally right.

            It brought me to another another question : rather than measuring the effect of a binary treatment, is it possible to use csdid to mesure the effect of a treatment that varies in intensity ? Let's say my individual is first treated at time t and is again treated at t+2. I want to see whether being treated once and being treated twice (or more than twice) has different effects.
            Because gvar identifies treated individual it only identifies the first time an observation is treated, I am not sure it is possible.

            Thanks in advance !

            Best,

            Mathilde

            Comment


            • unfortunately its not
              The only approximation to getting such an effect would be to analyze the data using each treatment intensity as its own estimate. In this case, however, would not be feasible to compare the results across intensity (except perhaps under the assumption of independence.

              Comment


              • Dear Fernando,

                I hope you are doing well! I have a quick question about CSDID command:

                If I use repeated cross-sectional estimator (without specifying ivar) and cluster the standard errors at variable "ID", does it includes ID fixed-effects in the estimation?

                Thank you!

                Comment


                • Thanks Fernando.

                  I am not entirely sure of how it would work in the code.
                  If I were to analyse each treatment intensity using its own estimate (let's say I only have two intensity), I still could have only one dummy as my treatment variable (gvar). So to look at the effect of high intensity of treatment, I would have the high category of intensity considered as my treatment and the other category of intensity (lower level of treatment) as a control variable ? And then the reverse to look at the impact of low treatment intensity ?

                  Sorry, I tried working this out alone but could not be entirely sure it would work. Thank you a lot for your help !

                  Best,

                  Mathilde Perrot

                  Comment


                  • Originally posted by Mingyu Qi View Post
                    Dear Fernando,

                    I hope you are doing well! I have a quick question about CSDID command:

                    If I use repeated cross-sectional estimator (without specifying ivar) and cluster the standard errors at variable "ID", does it includes ID fixed-effects in the estimation?

                    Thank you!
                    No
                    when clustering, the only thing that changes is the standard error aggregation. No fixed effects are added to the model

                    Comment


                    • Hi Fernando,

                      Thanks so much for all your help on this forum! I had two conceptual questions for you:

                      My project is looking at the effect of a policy on hospital admissions (the unit of observation is the admission, clustered by hospitals, as the treatment is implemented at the hospital level).

                      The command used is csdid outcome age sex comorbidity_index, gvar(treatment_year) time(year) agg(simple) cluster(hospital_number)

                      1. When looking at hospital length of stay, the DID estimate returned by csdid agg(simple) is significantly larger than the typical DID using OLS (OLS estimate is 0.05 and csdid estimate is 25). The dataset has ~250/5,000,000 observations with lengths of stay that are significant outliers. When removing these, the csdid estimate comes down to about 4, and if I were to further remove the top 0.1% I imagine it would come down even further. The outliers are distributed between treatment and control groups and across years and hospitals.

                      Conceptually, why do outliers affect the estimate so much? The fraction of outliers is very small. How do I determine what level of outlier needs to be addressed/removed?

                      2. When determining the ATT, is the year of treatment excluded? I wanted to keep a washout period for the effect of the policy so don't want the year of treatment to be considered in either the pre- or post- periods when trying to understand treatment effect.

                      Please let me know if you need further information.

                      Thanks!

                      Comment


                      • Hi Mathilde
                        perhaps the following example may help

                        ```
                        ssc install frause
                        frause mpdta, clear
                        ** Let me create an intensity variable (here assuming is related to pop size (just for this example, yours is based on the definition you already have)
                        gen high = (lpop>3.3)+1
                        replace high = 0 if first==0
                        ** Notice that never treated have a high=0, low intensity will have a high=1 and high intensity a high=2
                        qui:csdid lemp if inlist(high,0,1), ivar(countyreal) gvar(first) time(year)
                        estat simple
                        qui:csdid lemp if inlist(high,0,2), ivar(countyreal) gvar(first) time(year)
                        estat simple
                        ** So, you can now estimate the two models point estimates,
                        ```
                        HTH

                        Comment


                        • Hi Sneha
                          At the end of the day, CSDID is a model that relies on means, predicted means, or weighted means. And means are very sensitive to outliers.
                          Also, if the outliers are in your treated sample, (not control) their "weight" might be larger than you imagine

                          for your second question, short answer is.
                          when you estimate the ATT for any given year, only the information of that year, and the period before treatment, are considered for the ATT estimation.
                          So, as long as you do not look into the treatment year, you don't have to worry about it.
                          Fernando

                          Comment


                          • Dear Fernando,
                            You say that you could also run a test on all aggregated pretrend effects in alternative to "estat pretend". If so, can you please tell me the code for Stata?
                            Thank you very much!

                            Comment


                            • sure, here an example
                              Code:
                              frause mpdta, clear
                              csdid lemp, ivar(county) gvar(first) time(year)
                              estat event, post
                              test Tm3 Tm2 Tm1
                              additionally, if you estimate the uniform CI (wboot) you can simply look into the CI.
                              HTH
                              Fernando

                              Comment


                              • Dear Fernando,
                                Thanks for all the helpful comments here.

                                In post #7 you wrote about when the pretrend does not hold. "Pretrends in csdid is only for testing. they are not used for the estimation. However if post estimation pretrends are significant, it means that DID assumptions do not hold, and you cant really use this method. Of course you can argue pretrend holds up to some period, (say it holds for 10 periods before treatment but not 15)"

                                Is there a way to restrict the command estat pretrend to certain years to implement this advice?

                                All the best,
                                Katarina

                                Comment

                                Working...
                                X