Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Introducing the Forward DID Command

    Hey everyone. Happy to finally be sharing software for Stata once more! Been using Python these days. Anyways, I've developed the Forward DID command for Stata. The help file, ado file, as well as two sample datasets are available at my website (since it's still under development, I won't be sending it to ssc just yet, so you'll need to put it at your path for new commands manually, unless there's a way to do this I'm unaware of). At present, it should work for all Statas above and including 16, as it uses frames. There are no special libraries or additional commands the user needs, and it is written entirely in Stata's ado language.

    Forward DD comes in handy when we wish to estimate the average treatment effect on the treated for one or more units, but we don't know what the most relevant ones are. It uses a variant of the forward selection algorithm (which daniel klein was most helpful in giving suggestions for the underlying code) to select the optimal control group for a treated unit. We select the optimal control group based on the pre-intervention outcome data. for our control units. After we select the control group, we estimate the ATT and 95% CIs following the method described in the original paper. At present it only is automated for one treated unit, however, if you know enough about the developments in DD, you can likely extend this to multiple treated units with a little dynamic adjustment for the control group).

    As usual, feedback and comments are most appreciated. For an example of how it works, we can do

    Code:
    u "agbasque.dta", clear
    
    
    qui fdid gdpcap, tr(treat) gr1opts(scheme(sj) name(ag, replace))
    
    cwf cfframe
    This returns the following frame (the cfframe) :

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(year gdpcap5) float(cf te)
    1955  3.853184630005267   3.75793     .0952546
    1956 3.9456582961508766   3.90803    .03762826
    1957  4.033561734872626 4.0553446  -.021782847
    1958  4.023421896896646  4.097583   -.07416092
    1959  4.013781968405232 4.1396422   -.12586027
    1960  4.285918396222732  4.401853   -.11593468
    1961  4.574336095797406  4.677667   -.10333104
    1962  4.898957353563045  4.938842   -.03988494
    1963  5.197014981629133  5.187985   .009029562
    1964 5.3389029787527225  5.259322    .07958081
    1965  5.465153005251848  5.324697    .14045647
    1966  5.545915627064143  5.448125    .09779026
    1967  5.614895726639487  5.563021    .05187454
    1968 5.8521849330715785   5.79924     .0529453
    1969 6.0814054173695915    6.0361     .0453055
    1970   6.17009424134957  6.171775 -.0016810996
    1971  6.283633404546246  6.315913  -.032279797
    1972 6.5555553986528405    6.6104    -.0548448
    1973  6.810768561103078   6.90096   -.09019189
    1974  7.105184302810804  7.055095    .05008958
    1975  7.377891682175629   7.20316     .1747319
    1976  7.232933621922754   7.27621   -.04327621
    1977  7.089831372119127  7.344905   -.25507352
    1978  6.786703607144611  7.312414    -.5257106
    1979 6.6398173868571035  7.322126    -.6823086
    1980  6.562839171369564  7.367006    -.8041667
    1981   6.50078545499277  7.436914    -.9361285
    1982  6.545058606999563  7.550632   -1.0055734
    1983  6.595329801139407  7.669598   -1.0742679
    1984  6.761496750091492  7.768819   -1.0073225
    1985  6.937160671727721  7.872968    -.9358075
    1986  7.332191151300521  8.342334   -1.0101427
    1987  7.742788123594152  8.811522   -1.0687335
    1988   8.12053664075889  9.270319   -1.1497823
    1989  8.509711162324157  9.724476   -1.2147647
    1990  8.776777889074104  9.961907   -1.1851295
    1991   9.02527866619582 10.199697   -1.1744179
    1992  8.873892824706335  9.992613     -1.11872
    1993  8.718223539089278  9.781245   -1.0630217
    1994  9.018137849286365  10.13043   -1.1122934
    1995  9.440873861653367 10.433558    -.9926846
    1996   9.68651813767495 10.676703    -.9901853
    1997 10.170665872808662  11.12229    -.9516248
    end
    format %ty year

    Here we have the counterfactual for the Basque Country had terrorism not occurred, and we also have the observed values. The counterfactual is a convex, uniform combination of the states Cataluna and Aragon, replicating the original findings from the first paper describing the synthetic control method. Please, do let me know how you like it (if you do!).

  • #2
    Th way you may install fdid and its associated help file is
    Code:
    copy "https://raw.githubusercontent.com/jgreathouse9/jgreathouse9.github.io/master/stata/fdid/fdid.ado" "C:\ado\plus\f\fdid.ado", replace
    copy "https://raw.githubusercontent.com/jgreathouse9/jgreathouse9.github.io/master/stata/fdid/fdid.sthlp" "C:\ado\plus\f\fdid.sthlp", replace
    where you copy the files from my github to wherever your ado files are stored. For example, in my case (starting without the files in my directory) I did
    Code:
    copy "https://raw.githubusercontent.com/jgreathouse9/jgreathouse9.github.io/master/stata/fdid/fdid.ado" "C:\ado\plus\f\fdid.ado", replace
    copy "https://raw.githubusercontent.com/jgreathouse9/jgreathouse9.github.io/master/stata/fdid/fdid.sthlp" "C:\ado\plus\f\fdid.sthlp", replace
    
    clear *
    
    u "https://github.com/jgreathouse9/jgreathouse9.github.io/raw/master/stata/fdid/hcw.dta"
    
    fdid gdp, tr(treat) unitnames(state) gr1opts(scheme(sj) name(hcw, replace))
    So long as you install it at the specified directory, you should be able to get the same results as I did.

    Comment


    • #3
      Thanks, Jared. I've got it up and running with my own data. Very fast.

      A couple of thoughts from early use:

      1. I've noticed the d(df_m) and e(F) and e(p) are empty. Maybe not useful, but empty in both your data and mine.

      2. The results are not presented after the algorithm runs. I had to "matrix list e(ATTS)" to see results. It might be nice just to have them automatically presented (like sdid).

      3. One think I like about sdid is that it gives you a coefficient, se, and t, thus making it a fairly typical presentation. The se after fdid can be calculated (assuming 1.96, which is what the ado has). But perhaps a summary presentation like sdid of the results would be useful.

      4. I'm using preserve/restore. Since you're using frames, it might be nice to keep the original dataset and then create two frames for the modified data. After estimation, isome text could indicate what's in what frame and their names.

      Comment


      • #4
        Hey thanks for working with it. Try and install it again (from my most recent post). I give the ATT, 95% CI, and the preintervention R2. I make it more similar to sdid, in that regard.

        Edit: I believe this addresses the second point too. I make a new frame that copies the original dataset, then uses that to manipulate the data. I drop it at the end, so the user should have their original df and the cfframe. That way the original data isn't destroyed/altered.

        In the help file, i think I'll describe what each variable in cfframe is.

        That is, when I run

        Code:
        clear *
        
        u "https://github.com/jgreathouse9/jgreathouse9.github.io/raw/master/stata/fdid/hcw.dta"
        cls
        fdid gdp, tr(treat) unitnames(state) gr1opts(scheme(sj) name(hcw, replace))
        I get

        Code:
        Forward Difference-in-Differences
        
        -----------------------------------------------------------------------------
                 gdp |     ATT     |     [95% Conf. Interval]     | R-Square     
        -------------+---------------------------------------------------------------
               treat |   0.02540      0.01738     0.03343           0.84278
        -----------------------------------------------------------------------------
        FDID selects philippines, singapore, thailand, norway, mexico, korea, indonesia, newzealand, malaysia, as the optimal donors.
        Refer to Li (2024) for theoretical derivations.
        in return.
        Last edited by Jared Greathouse; 13 Jul 2024, 10:59.

        Comment


        • #5
          I have one general critical comment. The program starts with

          Code:
          cap frame drop cfframe
          cap frame drop reshaped
          What if the user has data in those frames? It is generally considered bad programming style to wipe objects even if it were documented. Use temporary objects and let the user decide whether to keep any and if so under which name(s).

          A minor general comment: drop the

          Code:
          capture program drop
          You never ever need it in a final ado-file. I have encountered this line so often that I was about to write a brief insert for SJ about it. There are hardly any side effects, so I decided not to. But I might write a brief post here on Statalist.

          Comment


          • #6
            daniel klein yeah I think you mentioned the last part to me before. I'll do that.

            What i may do for the first point, then is either allow users to have the option to return the temporary frames as something for use after the program is finished, or I'll just force them to specify what the name of the frame returned should be, that way there's no clash with preexisting frames.

            Comment


            • #7
              Got the new version and am playing with it. I'm looking forward to studying up on FDID, and I appreciate your willingness to take comments.

              Thoughts.

              1. Following Daniel, I'd give options on frame names, but still have an odd default (_fdid_frame_1).

              2. Like sdid, add the matrix e(series) [including everything you're dropping into ccframe], but have a frame as an option. I think this would be a mkmat from ccframe, so easy. I usually take e(series) from sdid and drop it in a new frame to make a useful graph, so having the option to automatically do so is nice.

              3. To the results, I'd add the z and prob level. That stuff is useful, and all the bits are in the ado file.

              4. It's unusual for the id to be a string and not a number. I suppose that makes listing them easier in the results, but most programs require a number. The error message was clear, however. It will be a commonly invoked error, given the typical practice of requiring a number.

              5. The ereturn could use some modifications.

              A. I'd prefer: e(ATT) e(se) e(z) e(p), and maybe e(ATT_lb), e(ATT_ub). Or, maybe post all the results to r(table). Anything you see in the results should be accessible in ereturn or return.. The se appears nowhere in ereturn, so someone would have to type that in to use it later, or manipulate the ATT and bounds to calculate it.

              B. The e(cmdline) is a background command from "hidden" estimations. You might post the original command there and dump the extra bits [e(cmd) e(predict)].

              C Similarly, I think e(b) and e(V) could confuse people, as these are background results that do not link to the reported results in any obvious way. I think they could be deleted.

              D. For e(ATTs), I'd move r2 to e(r2) and drop rmse since it is in e(rmse) already. It might be cleaner to have just e(ATT), e(se), and so forth. I don't mind having the UB/LB in a ATTs matrix, but this requires one to pull from a matrix rather than access directly [but the same is true for r(table)].

              E. This isn't necessarily on you, but you still might think about how asdoc/estout/etc... are going to work with this. (I think we both can imagine the stream of posts on statalist about this.) This would requires some rewriting of the way things are presented now, which I don't find very "Stata-like" even if informative.

              a. The "Successful" line is nice, but un-needed. If it runs, it runs. Error codes appear to work.
              b. Treatment_measured/Treated unit/ControlUnits could be under the results (like notes) (You already have chosen units as a note), which would asdoc cleaner.
              c. The table asdoc's a bit strange. That may be an easy fix.
              d. The list of the control pool could be stored in an e(controllist) and not reported in the results.
















              Comment


              • #8
                I'm not getting the graph in the updated version.

                Comment


                • #9
                  In some cases (when I change the DV), the optimal donor list is appearing as numbers, even though the list of the pool in strings. I may have to share some data with you to trace that issue.

                  Comment


                  • #10
                    Following Daniel, I'd give options on frame names, but still have an odd default (_fdid_frame_1).
                    yeah I'll go with the "odd name method" fow now, the tempname was giving me problems.

                    Like sdid, add the matrix e(series) [including everything you're dropping into ccframe], but have a frame as an option
                    I'll need to run sdid and see what this refers to, but I think I get what you mean.

                    To the results, I'd add the z and prob level. That stuff is useful, and all the bits are in the ado file.
                    Agreed.

                    It's unusual for the id to be a string and not a number.
                    Well technically, the id is a number. It is the number supplied to xtset. The reason I'm creating value labels for them under the hood, though, is so we can know which donors are what. otherwise, we'd have a control list of, say, "1,45,90,92", and that's less informative. So, the id is technically a number, right?

                    The ereturn could use some modifications.
                    I agree, I need to figure out how it will return only the things I define instead of what is also reported by cnsreg.

                    Or, maybe post all the results to r(table). Anything you see in the results should be accessible in ereturn or return.
                    I agree, rtable it is. In the newer version, (on my machine) I report the standard error in e(ATTs) (I should really change that name).
                    Similarly, I think e(b) and e(V) could confuse people, as these are background results that do not link to the reported results in any obvious way. I think they could be deleted.
                    yeah, related to the ereturn remark above.. I guess this would be a job for ereturn clear?

                    I think we both can imagine the stream of posts on statalist about this.
                    Unfortunately, I can.

                    Treatment_measured/Treated unit/ControlUnits could be under the results (like notes) (You already have chosen units as a note), which would asdoc cleaner.
                    yeah that's true, it would be cleaner like that. I modeled lots of this off the original synth, so maybe this is a holdout that I could get rid of and put it under the results.

                    The list of the control pool could be stored in an e(controllist) and not reported in the results.
                    I agree, but it kind of makes things easier to spot check. It's already reported under e(selected).

                    I'm not getting the graph in the updated version.
                    The plot is now optional. If the user specifies gr1opts, then the plot is returned. Else, if they specify nothing, no figure is created.

                    In some cases (when I change the DV), the optimal donor list is appearing as numbers, even though the list of the pool in strings. I may have to share some data with you to trace that issue.
                    Yeah email me the data and code you used. The optimal donor pool shouldn't be numbers. Thanks, by the way, for such detailed remarks.

                    Comment


                    • #11
                      As of now, all of the suggestions above were addressed. All the results are returned, the t, p, and SE stats are there, and all the rest. The help and ado files have also been updated to reflect these changes. Assuming you use the versions of fdid at this link, the following code should run without any errors

                      Code:
                      clear *
                      
                      u "https://github.com/jgreathouse9/jgreathouse9.github.io/raw/master/stata/fdid/hcw.dta"
                      
                      cls
                      fdid gdp, tr(treat) unitnames(state) gr2opts(scheme(sj) name(hcwte, replace)) 
                      
                      cls
                      clear *
                      
                      import delim "https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv", clear delim(";")
                      
                      egen id = group(state)
                      
                      xtset id year, y
                      
                      fdid packspercapita, tr(treated) unitnames(state) // gr1opts(scheme(sj) name(p99, replace))
                      
                      
                      cls
                      clear *
                      
                      u "https://github.com/jgreathouse9/jgreathouse9.github.io/raw/master/stata/fdid/agbasque.dta", clear
                      
                      fdid gdpcap, tr(treat)
                      it also runs when set varabbrev is off, since SJ and ssc will care about that. Now, I guess I have to worry about how to get it to net install correctly.

                      Comment


                      • #12
                        Something isn't kosher. The 95% CI do not match up with the SE/p.


                        PHP Code:

                        Forward Difference
                        -in-Differences          T0 R2:    0.892     T0 RMSE:    0.087
                        -------------------------------------------------------------------------------------------
                               
                        linvr |      ATT        t           SE         [95ConfInterval]     p
                        -------------+-----------------------------------------------------------------------------
                                 
                        did |    -0.187     1.861       0.1006      -0.2466     -0.1277    0.063
                        ------------------------------------------------------------------------------------------- 
                        When you construct the CI, you include /sqrt(`t2') , but you do not when you calculate the t-stat or the probability level.

                        My reading of Li is:

                        ATT +- 1.96*sqrt(omegahat/t2).

                        Looks like line 726 sets omegahat as sqrt(omegahat). But you still need the sqrt(`t2') in there.

                        Confidence interval from ado:
                        Code:
                        745  scalar CILB= scalar(ATT) - ((invnormal(0.975) * scalar(omegahatdid))/sqrt(`t2'))
                        746
                        747  scalar CIUB= scalar(ATT) + ((invnormal(0.975) * scalar(omegahatdid))/sqrt(`t2'))
                        I got mine squared doing this:
                        Code:
                        753  scalar tstat = abs(scalar(ATT)/(scalar(omegahatdid)/sqrt(`t2')))
                        ...
                        782  di as text %12s abbrev("`treatment'",12) " {c |} " as result %9.3f scalar(ATT) " "%9.3f scalar(tstat) "    " %9.4f scalar(omegahatdid)/sqrt(`t2') "
                        It might be easier to create a scalar se early on, and then use that for all the CI/p/z calculations.

                        Comment


                        • #13
                          Also, rather than invnormal(0.975), do you want to make this sample size specific? I doubt it would matter much, since if you have less than 30 observations, you are not likely to be using this.

                          But one of the listed advantages of fdid (by Li) is that you can you smaller time samples than with SC.
                          Last edited by George Ford; 14 Jul 2024, 08:50.

                          Comment


                          • #14
                            What would be cool is adding method(fdid) to sdid.

                            Comment


                            • #15
                              Alright folks, the official way to install the package (along with running the empirical examples) is:

                              Code:
                              clear *
                              
                              
                              cls
                              
                              net install fdid, from("https://raw.githubusercontent.com/jgreathouse9/FDIDTutorial/main")
                              
                              
                              
                              
                              clear *
                              
                              u "https://github.com/jgreathouse9/jgreathouse9.github.io/raw/master/stata/fdid/hcw.dta"
                              
                              cls
                              fdid gdp, tr(treat) unitnames(state) gr2opts(scheme(sj) name(hcwte, replace)) 
                              
                              cls
                              clear *
                              
                              import delim "https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv", clear delim(";")
                              
                              egen id = group(state)
                              
                              xtset id year, y
                              
                              fdid packspercapita, tr(treated) unitnames(state) // gr1opts(scheme(sj) name(p99, replace))
                              
                              
                              cls
                              clear *
                              
                              u "https://github.com/jgreathouse9/jgreathouse9.github.io/raw/master/stata/fdid/agbasque.dta", clear
                              
                              fdid gdpcap, tr(treat)
                              I had to switch it to my other repo, since net install did not wish to play nice with my github io site, Anyways, here it is!

                              Comment

                              Working...
                              X