Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction with rdrobust

    Hi! I am trying to add an interaction to rdrobust and I get the following error: "factor-variable and time-series operators not allowed"

    My code is:

    rdrobust runs_again margin_victory, covs(i.party##i.year) ///
    p(1) kernel(triangular) bwselect(mserd) all

    What am I doing wrong here? Please help.

  • #2
    The only thing I know about -rdrobust- is that it is not an official Stata program. Many user-written programs do not support factor variable notation. I'll assume that is the case here.

    So the workaround will be to generate your own indicator variables for the party#year interaction and use those, This might be a place where the archaic -xi- command comes in handy:

    Code:
    xi i.party*i.year
    rdrobust runs_again margin_victory, covs(_I*) p(1) kernel(triangular) bwselect(mserd) all

    Comment


    • #3
      Thanks, Clyde! That worked!

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        The only thing I know about -rdrobust- is that it is not an official Stata program. Many user-written programs do not support factor variable notation. I'll assume that is the case here.

        So the workaround will be to generate your own indicator variables for the party#year interaction and use those, This might be a place where the archaic -xi- command comes in handy:

        Code:
        xi i.party*i.year
        rdrobust runs_again margin_victory, covs(_I*) p(1) kernel(triangular) bwselect(mserd) all
        Dear Clyde or fellow STATA users,

        a follow-up to this topic. Is there any way one can speed up the computation for rdrobust? My sample size is around 80k observations with around 900 indicator variable ("dummy variables"). If I run the RDD "manually" (reghdfe with treatment, running variable, and interaction term) including the indicator variables, it takes STATA around 1sec. If I try to run rdrobust with the indicator variables, it takes more than a day (actually is still computing as of now). Unfortunately, I cannot drop the indicator variables due to the research design and would need to do it using rdrobust. Any suggestions how it could be improved? Computation time seems to increase exponentially (low amount of indicator variables is done pretty fast). I am using STATA /SE 16.1 and relatively good Lenovo Thinkpad laptop.

        Best regards,
        Pascal
        Last edited by Pascal Meier; 04 Aug 2022, 15:44.

        Comment


        • #5
          -rdrobust- is a user-written command, and I am not familiar with it. If its help file doesn't contain any information that might bear on your question, you might try contacting the authors of the command. (I don't think they follow this forum.)

          That said, why do you have to use -rdrobust-? If you can emulate its function with -reghdfe- and that runs quickly, why not just go with that?

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            -rdrobust- is a user-written command, and I am not familiar with it. If its help file doesn't contain any information that might bear on your question, you might try contacting the authors of the command. (I don't think they follow this forum.)

            That said, why do you have to use -rdrobust-? If you can emulate its function with -reghdfe- and that runs quickly, why not just go with that?
            Dear Clyde,

            thanks for your fast response. The following solution seems to improve computation time considerably: Calculating the optimal bandwidth using RDBWSELECT (without indicator variables) and than "manually" set the bandwidth in RDROBUST. Given that all other models are estimated using RDROBUST, it would be strange to change the estimation method (or at least command) midway.

            So, my problem is almost solved. Last stumbling stone is how I can "automatically" insert the estimated bandwidth into RDROBUST. RDBWSELECT stores the bandwidth in e(h_mserd) and e(b_mserd). However, the following code seems not to work (" <istmt>: 3499 h_mserd not found"):

            Code:
            rdbwselect Y X, c(0)
            rdrobust Y X, c(0) covs(indvar*) h(e(h_mserd)) b(e(b_mserd)) all
            Is there any way how I can take the calculated bandwidths stored in e() and insert it in the second line of command without directly stating the actual bandwidth (e.g., 10)? I do not want to adjust the bandwidth "manually" each time there is a slight change in the sample for instance. There should be an easy way to do it, probably I am missing something.

            Again, many thanks for your help.
            Last edited by Pascal Meier; 05 Aug 2022, 09:00.

            Comment


            • #7
              As I am unfamiliar with -rdrobust-, the best I can offer you are some guesses that may well prove useless.

              If I were in your situation I would study the Syntax section of -help rdrobust- and see what -rdrobust- expects to find in the -h()- and -b()- options. If it is looking for varname or varlist, then you won't be able to pluck these numbers directly out of e(), you will have to create variables with those values. If, however, it expects to find a real or a numlist, then it should work with the references to e(h_mserd) and e(b_mserd) you gave, but you could see if you have better luck with `e(h_mserd)' and `e(b_mserd)' instead.

              The other question you always have to ponder with a community-contributed program is whether it actually works as described. After you ran -rdbwselect-, did you actually verify that e(h_mserd) and e(b_mserd) are there? Try -display e(h_mserd), e(b_mserd)-. If they aren't there then perhaps you are not using -rdbwselect- correctly, or there is a problem with your data and it was unable to run to completion, or the program has a bug.

              These are just general approaches to troubleshooting. If I knew something about -rdrobust- I might be able to offer something more concrete and specific. As a last resort, if you cannot figure it out and nobody following the thread has more effective suggestions, try contacting the authors of the program.

              Comment


              • #8
                Wait stop. 900 indicators? For what? For what reason?


                Do we mean literally 900 indicator variables that take up a column, or one variable with 900 levels?
                Last edited by Jared Greathouse; 05 Aug 2022, 17:28.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  As I am unfamiliar with -rdrobust-, the best I can offer you are some guesses that may well prove useless.

                  If I were in your situation I would study the Syntax section of -help rdrobust- and see what -rdrobust- expects to find in the -h()- and -b()- options. If it is looking for varname or varlist, then you won't be able to pluck these numbers directly out of e(), you will have to create variables with those values. If, however, it expects to find a real or a numlist, then it should work with the references to e(h_mserd) and e(b_mserd) you gave, but you could see if you have better luck with `e(h_mserd)' and `e(b_mserd)' instead.

                  The other question you always have to ponder with a community-contributed program is whether it actually works as described. After you ran -rdbwselect-, did you actually verify that e(h_mserd) and e(b_mserd) are there? Try -display e(h_mserd), e(b_mserd)-. If they aren't there then perhaps you are not using -rdbwselect- correctly, or there is a problem with your data and it was unable to run to completion, or the program has a bug.

                  These are just general approaches to troubleshooting. If I knew something about -rdrobust- I might be able to offer something more concrete and specific. As a last resort, if you cannot figure it out and nobody following the thread has more effective suggestions, try contacting the authors of the program.
                  Thank you very much for your helpful suggestions as always, highly appreciated! The issue was solved by using " `e(h_mserd)' " instead of "e(h_mserd)".

                  @Jared: I use 900 indicator variables ("dummy" variables"). Each of these in a separate column as RDROBUST cannot handle commands such as i.indicatorvariable within the covs() option. I use them because I try be as close to an existing study (which uses an RDD with fixed effects) as possible.
                  Last edited by Pascal Meier; 07 Aug 2022, 05:20.

                  Comment


                  • #10
                    Originally posted by Pascal Meier View Post

                    Dear Clyde,

                    thanks for your fast response. The following solution seems to improve computation time considerably: Calculating the optimal bandwidth using RDBWSELECT (without indicator variables) and than "manually" set the bandwidth in RDROBUST. Given that all other models are estimated using RDROBUST, it would be strange to change the estimation method (or at least command) midway.

                    So, my problem is almost solved. Last stumbling stone is how I can "automatically" insert the estimated bandwidth into RDROBUST. RDBWSELECT stores the bandwidth in e(h_mserd) and e(b_mserd). However, the following code seems not to work (" <istmt>: 3499 h_mserd not found"):

                    Code:
                    rdbwselect Y X, c(0)
                    rdrobust Y X, c(0) covs(indvar*) h(e(h_mserd)) b(e(b_mserd)) all
                    Is there any way how I can take the calculated bandwidths stored in e() and insert it in the second line of command without directly stating the actual bandwidth (e.g., 10)? I do not want to adjust the bandwidth "manually" each time there is a slight change in the sample for instance. There should be an easy way to do it, probably I am missing something.

                    Again, many thanks for your help.
                    I have a similar question, does anyone know how to include fixed effects in rdrobust but speed up the computation time? The solution given by Pascal Meier is not technically correct because when using rdbwselect to compute the optimal bandwidths, one MUST include the fixed effect indicators in covs() because the optimal bandwidth will change depending on the covariates. I also have high dimensional fixed effects and rdrobust takes ages to run (in fact with around 900 indicator variables, it runs for days on end).

                    Other solutions given on the forum is to replicate rdrobust by using reghdfe and then using WLS, but again this is not technically correct because rdrobust computes bias-adjusted estimates, which can not be "converted" over to reghdfe.

                    Comment


                    • #11
                      I've had this open for a few days and I finally have time to reply. First, Pascal's code implies that he has a sharp RD design. Chris: Is yours also a sharp design? If so, why are you trying to included fixed effects when they're not needed for consistency? You either have a clean SRD or you don't. One argument is to improve precision. That could be valid, but the problem is I don't think the asymptotic properties of the estimators produced by rdrobust have been worked out when large numbers of FEs are added. Do you have a panel data set, or is it cross section and you have groups?

                      If you say more about your problem I may have suggestions. For example, you might replace the fixed effects with the Mundlak correlated random effects approach. That won't be identical to FE in the RD case, but if the goal is simply to improve precision, it might work well. And, it's justified even with lots of groups over which you would define the fixed effects.

                      As a general rule, outside of a linear model and very specific nonlinear models (an exponential mean estimated by Poisson quasi-MLE), one must be very careful in simply adding large numbers of FEs to an estimation method. If you have lots of observations per FE, it should be fine. But even then, with a large number of FEs, one has to be careful.

                      Comment

                      Working...
                      X