RDD with multivariate local-linear regression

Duccio Milani

Join Date: Oct 2021

Posts: 23
#1

RDD with multivariate local-linear regression

17 Nov 2021, 17:48

Greetings to everyone,

For my research, I am conducting an RD analysis where in addition to the running variable, I included in the regression an exogenous covariate to study its interaction with the treatment.
In the context of a parametric approach, the application it's pretty straightforward, as I included it in the regression and then applied the standard procedure of the RDD.

I would like to extend my research to a non-parametric approach by using local-linear regression to test the validity and robustness of my results. The rdrobust package is very intuitive, but, as far as I understood, the possibility to include covariates in the options serves to refine the estimation but it doesn't provide estimates for the covariates.
My question is then, is there a way to use the rdrobust for multivariate local linear regression?

Alternatively, I am starting to investigate the possibility of resorting to multivariate locally weighted regression following the theory behind Ruppert and Wand (1994) paper. However, given my inexperience with the program, I would have a hard time implementing it on STATA, and I am struggling to find akin examples from where to start.

Thanks
Tags: None

Fei Wang

Join Date: Oct 2021
Posts: 726

18 Nov 2021, 01:42

As far as I know, -rdrobust- neither reports nor records estimates for covariates. To large extent we may replay the -rdrobust- estimates with traditional commands, like -reg- for sharp RD and -ivregress- for fuzzy RD. Below is an example using a data provided by the -rdrobust- package, under a sharp RD setting. By default, -rdrobust- applies local linear estimation with triangular kernel on an optimal bandwidth which can be largely replicated by -reg- where you'll see estimates for covariates.

Code:

. * Load example data
.         use rdrobust_senate.dta, clear

. 
. * Generate treatment
.         gen win = margin > 0

. 
. * Sharp RD with -rdrobust-: local linear and triangular kernel by default
.         rdrobust vote margin, covs(class)

Covariate-adjusted sharp RD estimates using local polynomial regression.

      Cutoff c = 0 | Left of c  Right of c            Number of obs =       1297
-------------------+----------------------            BW type       =      mserd
     Number of obs |       595         702            Kernel        = Triangular
Eff. Number of obs |       361         323            VCE method    =         NN
    Order est. (p) |         1           1
    Order bias (q) |         2           2
       BW est. (h) |    17.847      17.847
       BW bias (b) |    28.257      28.257
         rho (h/b) |     0.632       0.632

Outcome: vote. Running variable: margin.
--------------------------------------------------------------------------------
            Method |   Coef.    Std. Err.    z     P>|z|    [95% Conf. Interval]
-------------------+------------------------------------------------------------
      Conventional |  7.4345     1.4436   5.1500   0.000    4.60516      10.2639
            Robust |     -          -     4.3841   0.000     4.1764      10.9297
--------------------------------------------------------------------------------
Covariate-adjusted estimates. Additional covariates included: 1

.         local lb = -e(h_l)      //left bound of bandwidth

.         local rb = e(h_r)       //right bound of bandwidth

. 
. * Sharp RD with -reg-: local linear and triangular kernel
. * wgt: triangular weights 
.         gen wgt = 1 - margin/`rb' if inrange(margin, 0, `rb')
(1,044 missing values generated)

.         replace wgt = 1 - margin/`lb' if inrange(margin, `lb', 0)
(378 real changes made)

.         replace wgt = 0 if mi(wgt)
(666 real changes made)

.         reg vote win margin c.margin#c.win class [aw=wgt], r
(sum of wgt is 382.6937557454221)

Linear regression                               Number of obs     =        684
                                                F(4, 679)         =      50.91
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2234
                                                Root MSE          =     9.4278

--------------------------------------------------------------------------------
               |               Robust
          vote |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
           win |   7.434538   1.443337     5.15   0.000     4.600597    10.26848
        margin |   .1718975   .1407034     1.22   0.222    -.1043685    .4481636
               |
c.margin#c.win |   .0672004   .1813369     0.37   0.711    -.2888481    .4232488
               |
         class |  -1.335644   .4705338    -2.84   0.005     -2.25952   -.4117682
         _cons |   47.94116   1.364942    35.12   0.000     45.26114    50.62117
--------------------------------------------------------------------------------

Comment

Duccio Milani

Join Date: Oct 2021

Posts: 23
#3

18 Nov 2021, 03:45

I have tried this solution right away, and the coefficients of the two regressions match perfectly. I get opposite signs, but the treated group is below the cut-off in my case.

Thanks a lot for the code. I think it will be beneficial for other inexperienced researchers since it does not seem to be an openly explored scenario.
Comment

Duccio Milani

Join Date: Oct 2021
Posts: 23

05 Dec 2021, 08:21

Originally posted by Fei Wang View Post

Code:

. * Load example data
. use rdrobust_senate.dta, clear

.
. * Generate treatment
. gen win = margin > 0

.
. * Sharp RD with -rdrobust-: local linear and triangular kernel by default
. rdrobust vote margin, covs(class)

Covariate-adjusted sharp RD estimates using local polynomial regression.

Cutoff c = 0 | Left of c Right of c Number of obs = 1297
-------------------+---------------------- BW type = mserd
Number of obs | 595 702 Kernel = Triangular
Eff. Number of obs | 361 323 VCE method = NN
Order est. (p) | 1 1
Order bias (q) | 2 2
BW est. (h) | 17.847 17.847
BW bias (b) | 28.257 28.257
rho (h/b) | 0.632 0.632

Outcome: vote. Running variable: margin.
--------------------------------------------------------------------------------
Method | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------+------------------------------------------------------------
Conventional | 7.4345 1.4436 5.1500 0.000 4.60516 10.2639
Robust | - - 4.3841 0.000 4.1764 10.9297
--------------------------------------------------------------------------------
Covariate-adjusted estimates. Additional covariates included: 1

. local lb = -e(h_l) //left bound of bandwidth

. local rb = e(h_r) //right bound of bandwidth

.
. * Sharp RD with -reg-: local linear and triangular kernel
. * wgt: triangular weights
. gen wgt = 1 - margin/`rb' if inrange(margin, 0, `rb')
(1,044 missing values generated)

. replace wgt = 1 - margin/`lb' if inrange(margin, `lb', 0)
(378 real changes made)

. replace wgt = 0 if mi(wgt)
(666 real changes made)

. reg vote win margin c.margin#c.win class [aw=wgt], r
(sum of wgt is 382.6937557454221)

Linear regression Number of obs = 684
F(4, 679) = 50.91
Prob > F = 0.0000
R-squared = 0.2234
Root MSE = 9.4278

--------------------------------------------------------------------------------
| Robust
vote | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
win | 7.434538 1.443337 5.15 0.000 4.600597 10.26848
margin | .1718975 .1407034 1.22 0.222 -.1043685 .4481636
|
c.margin#c.win | .0672004 .1813369 0.37 0.711 -.2888481 .4232488
|
class | -1.335644 .4705338 -2.84 0.005 -2.25952 -.4117682
_cons | 47.94116 1.364942 35.12 0.000 45.26114 50.62117
--------------------------------------------------------------------------------

If I want to make explicit the model used and in particular define the local linear estimator, would this notation be correct?

\[ \min\limits_{b}\sum_{n=1}^{n}\{vote_i-b_0-b_1^{T}win_i+b_2^{T}margin_i+b_3^{T}win_i\cdot margin_i+b_4^{T}class_i\}^{2}\textbf{K} \]

where K represents the triangular kernel function with MSE bandwidth

Last edited by Duccio Milani; 05 Dec 2021, 08:28. Reason: typo in the formula

Comment

Fei Wang

Join Date: Oct 2021

Posts: 726
#5

05 Dec 2021, 12:04

Originally posted by Duccio Milani View Post

If I want to make explicit the model used and in particular define the local linear estimator, would this notation be correct?

\[ \min\limits_{b}\sum_{n=1}^{n}\{vote_i-b_0-b_1^{T}win_i+b_2^{T}margin_i+b_3^{T}win_i\cdot margin_i+b_4^{T}class_i\}^{2}\textbf{K} \]

where K represents the triangular kernel function with MSE bandwidth

Yes, it's correct. It's essentially a weighted least squares estimation. But you don't need to superscript b1, b2, b3 and b4, as all your regressors are in the form of scalars rather than vectors.

Last edited by Fei Wang; 05 Dec 2021, 12:48.
Comment
Duccio Milani

Join Date: Oct 2021

Posts: 23
#6

05 Dec 2021, 13:48

It makes sense. I haven't thought about it. Thanks!
Comment
Duccio Milani

Join Date: Oct 2021

Posts: 23
#7

31 Dec 2021, 02:34

Originally posted by Fei Wang View Post

Yes, it's correct. It's essentially a weighted least squares estimation. But you don't need to superscript b1, b2, b3 and b4, as all your regressors are in the form of scalars rather than vectors.

A further doubt arose when I looked again into the theory behind local polynomial regression.
In many theoretical papers using local polynomial estimations for RDD, a "pooled" regression function between treated and untreated is not made explicit (among others: Porter, 2003; Lee & Lemieux, 2010; Cattaneo & Titiunik, 2021). Instead, regressions are given to get the conditional expected value of y to the right and left of the cut-off and then calculate the difference between the parameters. So, to find the parameters of the treatment and the other covariates, does the "pooled" regression you suggested mirror the same procedure?

Thank you!
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#8

01 Jan 2022, 19:58

regressions are given to get the conditional expected value of y to the right and left of the cut-off and then calculate the difference between the parameters.

Yes. #2 is actually equivalent to the quoted procedure above.
Comment
Brayden Paur

Join Date: Apr 2022

Posts: 4
#9

29 Apr 2022, 08:22

I have a somewhat similar question related to the original post but it has to do with fuzzy RD. I have been unable to get the same results between fuzzy RD with covariates and ivregress 2sls. Here is where I have posted my question in more detail. Any help would be fantastic.
Comment

Announcement