Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Only keep observations used in rdrobust

    Dear All,

    I ran an rdrobust regression and would like to create a dummy variable that switches to 1 if the observation was used in the previous analysis. I believe it should be something similar to:
    Code:
    gen byte used_s=e(sample)
    However, after trying that option above, running:

    Code:
    rdrobust dv1, c(0) all kernel(tri) covs(c1 c2 c3) weights(w)
    gen byte used_s=e(sample)
    I realize that the number of observations used in the analysis, and reported by the rdrobust command ("Number of obs") is different from the number of observations when used_s=1.

    Could I please ask anyone the solution to this problem?

    Thank you!

    Cat

  • #2
    Cat:
    what youpre experiencing may be something like:
    Code:
    . use "C:\Program Files\Stata18\ado\base\a\auto.dta"
    (1978 automobile data)
    
    . regress price mpg if foreign==0
    
          Source |       SS           df       MS      Number of obs   =        52
    -------------+----------------------------------   F(1, 50)        =     17.05
           Model |   124392956         1   124392956   Prob > F        =    0.0001
        Residual |   364801844        50  7296036.89   R-squared       =    0.2543
    -------------+----------------------------------   Adj R-squared   =    0.2394
           Total |   489194801        51  9592054.92   Root MSE        =    2701.1
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
           _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
    ------------------------------------------------------------------------------
    
    . gen byte used_s=e(sample)
    
    . regress price mpg if used_s==0
    
          Source |       SS           df       MS      Number of obs   =        22
    -------------+----------------------------------   F(1, 20)        =     13.25
           Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
        Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
    -------------+----------------------------------   Adj R-squared   =    0.3685
           Total |   144363213        21   6874438.7   Root MSE        =    2083.6
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
           _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
    ------------------------------------------------------------------------------
    
    . regress price mpg if used_s==1
    
          Source |       SS           df       MS      Number of obs   =        52
    -------------+----------------------------------   F(1, 50)        =     17.05
           Model |   124392956         1   124392956   Prob > F        =    0.0001
        Residual |   364801844        50  7296036.89   R-squared       =    0.2543
    -------------+----------------------------------   Adj R-squared   =    0.2394
           Total |   489194801        51  9592054.92   Root MSE        =    2701.1
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
           _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
    ------------------------------------------------------------------------------
    
    .
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hi Carlo,

      Thank you very much for your reply. Here is my code:

      Code:
      . rdrobust dv1, c(0) all kernel(tri) covs(c1 c2 c3) weights(w)
      Multicollinearity issue detected in covs. Redundant covariates were removed.
      Mass points detected in the running variable.
      
      Covariate-adjusted Sharp RD estimates using local polynomial regression.
      
      Cutoff c = 1944  Left of c  Right of c            Number of obs =       1263
      BW type       =      mserd
      Number of obs         64        1199            Kernel        = Triangular
      Eff. Number of obs         34          58            VCE method    =         NN
      Order est. (p)          1           1
      Order bias (q)          2           2
      BW est. (h)      4.385       4.385
      BW bias (b)      5.930       5.930
      rho (h/b)      0.739       0.739
      Unique obs         11          54
      
      Outcome: dv1. Running variable: year.
      
      Method    Coef.    Std. Err.    z     P>z    [95% Conf. Interval]
      
      Conventional  -.13807     .05677   -2.4318  0.015   -.249342     -.026789
      Bias-corrected  -.30086     .05677   -5.2992  0.000   -.412138     -.189584
      Robust  -.30086      .1136   -2.6484  0.008   -.523518     -.078203
      
      Covariate-adjusted estimates. Additional covariates included: 9
      Estimates adjusted for mass points in the running variable.
      
      . gen byte used_s=e(sample)
      
      . tab used_s
      
      used_s       Freq.     Percent        Cum.
      
      0      19,458       90.08       90.08
      1       2,143        9.92      100.00
      
      Total      21,601      100.00
      I suppose what I'm not understanding is why the two numbers in blue are not the same. I thought that the creation of the "used_s" and its value =1 variable would include only the observations used in the previous analysis. And I thought those observations were the ones displayed in " Number of obs = 1263".

      Thank you so much!

      Best,
      Cat

      PS: My goal is to use the observations from the rdrobust regression to calculate the density of respondents' ages in the sample actually used in the model, rather than calculating the age density of the total sample. This would help to understand the weight of each age group driving the analysis.
      Last edited by Cat Santos; 24 Oct 2024, 03:34.

      Comment


      • #4
        Hi Cat,

        I am not sure with what you are doing. Normally, you need an outcome and a running variable entering the rdrobust command (from SSC) (see help rdrobust). However, just only dv1 in your code so I am not sure this is the outcome or a running variable.
        Just assuming that your code in #3 is correct, you can check the number of observations as follows. Since you are using RD and the bandwidth used for estimations in #3 is 4.385, you can see that the effective number of observations on the left and the right of the cutoff point is 34 and 58, respectively. So, the total number of observations used for your RD regression is only 92 observations, not 1263 or 2,143. You can double check by running this code after the rdrobust: sum `e(outcomevar)' if dv1>=-e(h_l) & dv1<=e(h_r) (I assume here dv1 is your running variable). Hope this helps.

        Comment

        Working...
        X