Only keep observations used in rdrobust

Cat Santos

Join Date: Dec 2019

Posts: 62
#1

Only keep observations used in rdrobust

23 Oct 2024, 11:35

Dear All,

I ran an rdrobust regression and would like to create a dummy variable that switches to 1 if the observation was used in the previous analysis. I believe it should be something similar to:

Code:

gen byte used_s=e(sample)

However, after trying that option above, running:

Code:

rdrobust dv1, c(0) all kernel(tri) covs(c1 c2 c3) weights(w) gen byte used_s=e(sample)

I realize that the number of observations used in the analysis, and reported by the rdrobust command ("Number of obs") is different from the number of observations when used_s=1.

Could I please ask anyone the solution to this problem?

Thank you!

Cat
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17601

24 Oct 2024, 02:13

Cat:
what youpre experiencing may be something like:

Code:

. use "C:\Program Files\Stata18\ado\base\a\auto.dta"
(1978 automobile data)

. regress price mpg if foreign==0

      Source |       SS           df       MS      Number of obs   =        52
-------------+----------------------------------   F(1, 50)        =     17.05
       Model |   124392956         1   124392956   Prob > F        =    0.0001
    Residual |   364801844        50  7296036.89   R-squared       =    0.2543
-------------+----------------------------------   Adj R-squared   =    0.2394
       Total |   489194801        51  9592054.92   Root MSE        =    2701.1

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
       _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
------------------------------------------------------------------------------

. gen byte used_s=e(sample)

. regress price mpg if used_s==0

      Source |       SS           df       MS      Number of obs   =        22
-------------+----------------------------------   F(1, 20)        =     13.25
       Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
    Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
-------------+----------------------------------   Adj R-squared   =    0.3685
       Total |   144363213        21   6874438.7   Root MSE        =    2083.6

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
       _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
------------------------------------------------------------------------------

. regress price mpg if used_s==1

      Source |       SS           df       MS      Number of obs   =        52
-------------+----------------------------------   F(1, 50)        =     17.05
       Model |   124392956         1   124392956   Prob > F        =    0.0001
    Residual |   364801844        50  7296036.89   R-squared       =    0.2543
-------------+----------------------------------   Adj R-squared   =    0.2394
       Total |   489194801        51  9592054.92   Root MSE        =    2701.1

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
       _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Cat Santos

Join Date: Dec 2019
Posts: 62

24 Oct 2024, 03:31

Hi Carlo,

Thank you very much for your reply. Here is my code:

Code:

. rdrobust dv1, c(0) all kernel(tri) covs(c1 c2 c3) weights(w)
Multicollinearity issue detected in covs. Redundant covariates were removed.
Mass points detected in the running variable.

Covariate-adjusted Sharp RD estimates using local polynomial regression.

Cutoff c = 1944  Left of c  Right of c            Number of obs =       1263
BW type       =      mserd
Number of obs         64        1199            Kernel        = Triangular
Eff. Number of obs         34          58            VCE method    =         NN
Order est. (p)          1           1
Order bias (q)          2           2
BW est. (h)      4.385       4.385
BW bias (b)      5.930       5.930
rho (h/b)      0.739       0.739
Unique obs         11          54

Outcome: dv1. Running variable: year.

Method    Coef.    Std. Err.    z     P>z    [95% Conf. Interval]

Conventional  -.13807     .05677   -2.4318  0.015   -.249342     -.026789
Bias-corrected  -.30086     .05677   -5.2992  0.000   -.412138     -.189584
Robust  -.30086      .1136   -2.6484  0.008   -.523518     -.078203

Covariate-adjusted estimates. Additional covariates included: 9
Estimates adjusted for mass points in the running variable.

. gen byte used_s=e(sample)

. tab used_s

used_s       Freq.     Percent        Cum.

0      19,458       90.08       90.08
1       2,143        9.92      100.00

Total      21,601      100.00

I suppose what I'm not understanding is why the two numbers in blue are not the same. I thought that the creation of the "used_s" and its value =1 variable would include only the observations used in the previous analysis. And I thought those observations were the ones displayed in " Number of obs = 1263".

Thank you so much!

Best,
Cat

PS: My goal is to use the observations from the rdrobust regression to calculate the density of respondents' ages in the sample actually used in the model, rather than calculating the age density of the total sample. This would help to understand the weight of each age group driving the analysis.

Last edited by Cat Santos; 24 Oct 2024, 03:34.

Comment

Dung Le

Join Date: May 2018

Posts: 120
#4

24 Oct 2024, 07:37

Hi Cat,

I am not sure with what you are doing. Normally, you need an outcome and a running variable entering the rdrobust command (from SSC) (see help rdrobust). However, just only dv1 in your code so I am not sure this is the outcome or a running variable.
Just assuming that your code in #3 is correct, you can check the number of observations as follows. Since you are using RD and the bandwidth used for estimations in #3 is 4.385, you can see that the effective number of observations on the left and the right of the cutoff point is 34 and 58, respectively. So, the total number of observations used for your RD regression is only 92 observations, not 1263 or 2,143. You can double check by running this code after the rdrobust: sum `e(outcomevar)' if dv1>=-e(h_l) & dv1<=e(h_r) (I assume here dv1 is your running variable). Hope this helps.
Comment

Announcement