Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • RDD with multivariate local-linear regression

    Greetings to everyone,

    For my research, I am conducting an RD analysis where in addition to the running variable, I included in the regression an exogenous covariate to study its interaction with the treatment.
    In the context of a parametric approach, the application it's pretty straightforward, as I included it in the regression and then applied the standard procedure of the RDD.

    I would like to extend my research to a non-parametric approach by using local-linear regression to test the validity and robustness of my results. The rdrobust package is very intuitive, but, as far as I understood, the possibility to include covariates in the options serves to refine the estimation but it doesn't provide estimates for the covariates.
    My question is then, is there a way to use the rdrobust for multivariate local linear regression?

    Alternatively, I am starting to investigate the possibility of resorting to multivariate locally weighted regression following the theory behind Ruppert and Wand (1994) paper. However, given my inexperience with the program, I would have a hard time implementing it on STATA, and I am struggling to find akin examples from where to start.

    Thanks

  • #2
    As far as I know, -rdrobust- neither reports nor records estimates for covariates. To large extent we may replay the -rdrobust- estimates with traditional commands, like -reg- for sharp RD and -ivregress- for fuzzy RD. Below is an example using a data provided by the -rdrobust- package, under a sharp RD setting. By default, -rdrobust- applies local linear estimation with triangular kernel on an optimal bandwidth which can be largely replicated by -reg- where you'll see estimates for covariates.

    Code:
    . * Load example data
    .         use rdrobust_senate.dta, clear
    
    . 
    . * Generate treatment
    .         gen win = margin > 0
    
    . 
    . * Sharp RD with -rdrobust-: local linear and triangular kernel by default
    .         rdrobust vote margin, covs(class)
    
    Covariate-adjusted sharp RD estimates using local polynomial regression.
    
          Cutoff c = 0 | Left of c  Right of c            Number of obs =       1297
    -------------------+----------------------            BW type       =      mserd
         Number of obs |       595         702            Kernel        = Triangular
    Eff. Number of obs |       361         323            VCE method    =         NN
        Order est. (p) |         1           1
        Order bias (q) |         2           2
           BW est. (h) |    17.847      17.847
           BW bias (b) |    28.257      28.257
             rho (h/b) |     0.632       0.632
    
    Outcome: vote. Running variable: margin.
    --------------------------------------------------------------------------------
                Method |   Coef.    Std. Err.    z     P>|z|    [95% Conf. Interval]
    -------------------+------------------------------------------------------------
          Conventional |  7.4345     1.4436   5.1500   0.000    4.60516      10.2639
                Robust |     -          -     4.3841   0.000     4.1764      10.9297
    --------------------------------------------------------------------------------
    Covariate-adjusted estimates. Additional covariates included: 1
    
    .         local lb = -e(h_l)      //left bound of bandwidth
    
    .         local rb = e(h_r)       //right bound of bandwidth
    
    . 
    . * Sharp RD with -reg-: local linear and triangular kernel
    . * wgt: triangular weights 
    .         gen wgt = 1 - margin/`rb' if inrange(margin, 0, `rb')
    (1,044 missing values generated)
    
    .         replace wgt = 1 - margin/`lb' if inrange(margin, `lb', 0)
    (378 real changes made)
    
    .         replace wgt = 0 if mi(wgt)
    (666 real changes made)
    
    .         reg vote win margin c.margin#c.win class [aw=wgt], r
    (sum of wgt is 382.6937557454221)
    
    Linear regression                               Number of obs     =        684
                                                    F(4, 679)         =      50.91
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.2234
                                                    Root MSE          =     9.4278
    
    --------------------------------------------------------------------------------
                   |               Robust
              vote |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
               win |   7.434538   1.443337     5.15   0.000     4.600597    10.26848
            margin |   .1718975   .1407034     1.22   0.222    -.1043685    .4481636
                   |
    c.margin#c.win |   .0672004   .1813369     0.37   0.711    -.2888481    .4232488
                   |
             class |  -1.335644   .4705338    -2.84   0.005     -2.25952   -.4117682
             _cons |   47.94116   1.364942    35.12   0.000     45.26114    50.62117
    --------------------------------------------------------------------------------

    Comment


    • #3
      I have tried this solution right away, and the coefficients of the two regressions match perfectly. I get opposite signs, but the treated group is below the cut-off in my case.

      Thanks a lot for the code. I think it will be beneficial for other inexperienced researchers since it does not seem to be an openly explored scenario.

      Comment


      • #4
        Originally posted by Fei Wang View Post
        As far as I know, -rdrobust- neither reports nor records estimates for covariates. To large extent we may replay the -rdrobust- estimates with traditional commands, like -reg- for sharp RD and -ivregress- for fuzzy RD. Below is an example using a data provided by the -rdrobust- package, under a sharp RD setting. By default, -rdrobust- applies local linear estimation with triangular kernel on an optimal bandwidth which can be largely replicated by -reg- where you'll see estimates for covariates.

        Code:
        . * Load example data
        . use rdrobust_senate.dta, clear
        
        .
        . * Generate treatment
        . gen win = margin > 0
        
        .
        . * Sharp RD with -rdrobust-: local linear and triangular kernel by default
        . rdrobust vote margin, covs(class)
        
        Covariate-adjusted sharp RD estimates using local polynomial regression.
        
        Cutoff c = 0 | Left of c Right of c Number of obs = 1297
        -------------------+---------------------- BW type = mserd
        Number of obs | 595 702 Kernel = Triangular
        Eff. Number of obs | 361 323 VCE method = NN
        Order est. (p) | 1 1
        Order bias (q) | 2 2
        BW est. (h) | 17.847 17.847
        BW bias (b) | 28.257 28.257
        rho (h/b) | 0.632 0.632
        
        Outcome: vote. Running variable: margin.
        --------------------------------------------------------------------------------
        Method | Coef. Std. Err. z P>|z| [95% Conf. Interval]
        -------------------+------------------------------------------------------------
        Conventional | 7.4345 1.4436 5.1500 0.000 4.60516 10.2639
        Robust | - - 4.3841 0.000 4.1764 10.9297
        --------------------------------------------------------------------------------
        Covariate-adjusted estimates. Additional covariates included: 1
        
        . local lb = -e(h_l) //left bound of bandwidth
        
        . local rb = e(h_r) //right bound of bandwidth
        
        .
        . * Sharp RD with -reg-: local linear and triangular kernel
        . * wgt: triangular weights
        . gen wgt = 1 - margin/`rb' if inrange(margin, 0, `rb')
        (1,044 missing values generated)
        
        . replace wgt = 1 - margin/`lb' if inrange(margin, `lb', 0)
        (378 real changes made)
        
        . replace wgt = 0 if mi(wgt)
        (666 real changes made)
        
        . reg vote win margin c.margin#c.win class [aw=wgt], r
        (sum of wgt is 382.6937557454221)
        
        Linear regression Number of obs = 684
        F(4, 679) = 50.91
        Prob > F = 0.0000
        R-squared = 0.2234
        Root MSE = 9.4278
        
        --------------------------------------------------------------------------------
        | Robust
        vote | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        ---------------+----------------------------------------------------------------
        win | 7.434538 1.443337 5.15 0.000 4.600597 10.26848
        margin | .1718975 .1407034 1.22 0.222 -.1043685 .4481636
        |
        c.margin#c.win | .0672004 .1813369 0.37 0.711 -.2888481 .4232488
        |
        class | -1.335644 .4705338 -2.84 0.005 -2.25952 -.4117682
        _cons | 47.94116 1.364942 35.12 0.000 45.26114 50.62117
        --------------------------------------------------------------------------------
        If I want to make explicit the model used and in particular define the local linear estimator, would this notation be correct?

        \[ \min\limits_{b}\sum_{n=1}^{n}\{vote_i-b_0-b_1^{T}win_i+b_2^{T}margin_i+b_3^{T}win_i\cdot margin_i+b_4^{T}class_i\}^{2}\textbf{K} \]

        where K represents the triangular kernel function with MSE bandwidth
        Last edited by Duccio Milani; 05 Dec 2021, 09:28. Reason: typo in the formula

        Comment


        • #5
          Originally posted by Duccio Milani View Post

          If I want to make explicit the model used and in particular define the local linear estimator, would this notation be correct?

          \[ \min\limits_{b}\sum_{n=1}^{n}\{vote_i-b_0-b_1^{T}win_i+b_2^{T}margin_i+b_3^{T}win_i\cdot margin_i+b_4^{T}class_i\}^{2}\textbf{K} \]

          where K represents the triangular kernel function with MSE bandwidth
          Yes, it's correct. It's essentially a weighted least squares estimation. But you don't need to superscript b1, b2, b3 and b4, as all your regressors are in the form of scalars rather than vectors.
          Last edited by Fei Wang; 05 Dec 2021, 13:48.

          Comment


          • #6
            It makes sense. I haven't thought about it. Thanks!

            Comment


            • #7
              Originally posted by Fei Wang View Post

              Yes, it's correct. It's essentially a weighted least squares estimation. But you don't need to superscript b1, b2, b3 and b4, as all your regressors are in the form of scalars rather than vectors.

              A further doubt arose when I looked again into the theory behind local polynomial regression.
              In many theoretical papers using local polynomial estimations for RDD, a "pooled" regression function between treated and untreated is not made explicit (among others: Porter, 2003; Lee & Lemieux, 2010; Cattaneo & Titiunik, 2021). Instead, regressions are given to get the conditional expected value of y to the right and left of the cut-off and then calculate the difference between the parameters. So, to find the parameters of the treatment and the other covariates, does the "pooled" regression you suggested mirror the same procedure?

              Thank you!

              Comment


              • #8
                regressions are given to get the conditional expected value of y to the right and left of the cut-off and then calculate the difference between the parameters.
                Yes. #2 is actually equivalent to the quoted procedure above.

                Comment


                • #9
                  I have a somewhat similar question related to the original post but it has to do with fuzzy RD. I have been unable to get the same results between fuzzy RD with covariates and ivregress 2sls. Here is where I have posted my question in more detail. Any help would be fantastic.

                  Comment

                  Working...
                  X