Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unusually high number of observations in RDD

    Dear Statalists,

    I am running a regression discontinuity with the rdrobust plugin. My self created dataset consists of 15 258 observations. The plugin has worked perfectly since I started using it but now it started to show very high numbers in the N. By some reason it shows that I have 5 671 899 observations. The license I am running doesn't even allow that many observations. I am using Stata 16 on a Mac. Does anyone know what might be the problem? I understand that it might be hard to answer without seeing all my data but if someone else has experienced a similar problem I am very thankful for all help.

    The code I am using looks like this:
    Code:
    rdrobust afd_perc EVPNEAR_DIST if  EVPNEAR_DIST <10000 &  EVPNEAR_DIST >-10000, p  (1)

  • #2
    One thought: are you running the latest version? Using ssc install rdrobust installs this version:
    Code:
    . which rdrobust
    /.../Stata/ado/plus/r/rdrobust.ado
    *!version 8.0.2  04-03-2020
    If yours is older, perhaps the problem vanishes in the more recent release.

    Another thought: you reported an anomaly in your output. Perhaps there are other anomalies that you missed. You should copy your command and its output from Stata's Results window and paste it into a post using CODE delimiters as you did for your command above.

    Finally, I note that ssc describe rdrobust gives the authors' email addresses for support. Absent an answer here, you might contact them directly.

    Comment


    • #3
      Thank you William. I am using the latest version 8.0.2. I'll contact them if no one knows what the issue might be. Here is the output table.


      Code:
      .  rdrobust  afd_perc EVPNEAR_DIST if  EVPNEAR_DIST <10000 &  EVPNEAR_DIST >-10000, p  (1)
      Mass points detected in the running variable.
      
      Sharp RD estimates using local polynomial regression.
      
            Cutoff c = 0 | Left of c  Right of c            Number of obs =    5671236
      -------------------+----------------------            BW type       =      mserd
           Number of obs |       932        1010            Kernel        = Triangular
      Eff. Number of obs |        94          65            VCE method    =         NN
          Order est. (p) |         1           1
          Order bias (q) |         2           2
             BW est. (h) |  2306.695    2306.695
             BW bias (b) |  4354.814    4354.814
               rho (h/b) |     0.530       0.530
              Unique obs |       209         219
      
      Outcome: afd_perc. Running variable: EVPNEAR_DIST.
      --------------------------------------------------------------------------------
                  Method |   Coef.    Std. Err.    z     P>|z|    [95% Conf. Interval]
      -------------------+------------------------------------------------------------
            Conventional |  10.789     2.5061   4.3052   0.000    5.87732       15.701
                  Robust |     -          -     3.8230   0.000    5.61086      17.4162
      --------------------------------------------------------------------------------
      Estimates adjusted for mass points in the running variable.

      Comment


      • #4
        Aha, he said, pleased that his guess was correct and there was indeed more to be learned from the output of the rdrobust command than just the one symptom reported.

        I note that along with the anomalous number of observations, you also received (in two places) notice that the running variable has mass points - in other words, that there are multiple observations with identical values for EVPNEAR_DIST. Perhaps EVPNEAR_DIST is a discrete variable rather than a continuous variable?

        In your earlier work, where you did not encounter anomalous numbers of observations, did you also have mass points in your running variable, or was it truly continuous? If the latter, then my guess is that the anomalous number of observations is the result of how rdrobust adjusts for the mass points - I hypothesize by using weighting, so that the reported number of observations is actually a weighted count.

        A look at Calonico, Cattaneo, and Titiunik (2014) linked to in the the output of help rdrobust tells us that mass points violate the assumptions of the regression discontinuity methodology.

        REMARK 1—Discrete Running Variable: Assumption 1(a) rules out discrete-valued running variables. In applications where Xi exhibits many mass points near the cutoff, this assumption may still give a good approximation and our results might be used in practice. However, when Xi exhibits few mass points, our results do not apply directly without further assumptions and modifications, and other assumptions and inference approaches may be more appropriate; see, for example, Cattaneo, Frandsen, and Titiunik (2014).
        I didn't try to trace things further. Perhaps a deeper dive into the literature will find a discussion somewhere that covers how rdrobust handles mass points.

        Comment


        • #5
          Thank you William! Yes that is true. I am using electoral data which are coded to the nearest voting districts (gemeinde in German). The EVPNEAR_DIST is a continues variable which captures the distance to a border in meters. The mass points are driven by the concentration of observation at some specific locations, in this case, cities. The rdrobust isn't made specifically for spatial RDD:s but that it should report a weighted value in the observations doesn't seem logical to me as it reports the actual number of observations at the other side of the table.
          I rerun my previous estimations which are a very similar dataset, with a few adjustments in the number of observations. It also detects mass points, however, the number of observations are correct:

          Code:
           .  rdrobust afd_perc EVPNEAR_DIST  if EVPNEAR_DIST <10000 &  EVPNEAR_DIST >-10000, p (1)
          Mass points detected in the running variable.
          
          Sharp RD estimates using local polynomial regression.
          
                Cutoff c = 0 | Left of c  Right of c            Number of obs =       2074
          -------------------+----------------------            BW type       =      mserd
               Number of obs |      1020        1054            Kernel        = Triangular
          Eff. Number of obs |       117         109            VCE method    =         NN
              Order est. (p) |         1           1
              Order bias (q) |         2           2
                 BW est. (h) |  2274.713    2274.713
                 BW bias (b) |  4548.942    4548.942
                   rho (h/b) |     0.500       0.500
                  Unique obs |       200         214
          
          Outcome: afd_perc. Running variable: EVPNEAR_DIST.
          --------------------------------------------------------------------------------
                      Method |   Coef.    Std. Err.    z     P>|z|    [95% Conf. Interval]
          -------------------+------------------------------------------------------------
                Conventional |  9.5236     2.5694   3.7066   0.000    4.48774      14.5595
                      Robust |     -          -     3.3828   0.001    4.22082      15.8492
          --------------------------------------------------------------------------------
          Estimates adjusted for mass points in the running variable.
          
          .

          Comment


          • #6
            Very strange. If nobody else provides a better explanation or shares some relevant personal experience, I suggest you contact one of the authors at the email addresses give in the output of ssc describe rdrobust. The unexplained inconsistency in the output should be a concern to the developers.

            Comment


            • #7
              Update - it is a bug in the plugin. The information is taken from another variable which is not included in the model. In this case it was just an ID-code which is irrelevant for the output. I informed the developer of the plugin about the problem. So anyone else that uses it, just ignore the number of observations in the output, it is simply not correct.

              Comment


              • #8
                Hello, I am just stuck with "Mass points detected in the running variable.", no result comes out. The software keeps running without any result and the maximum time I allowed running for half a day. My observation number is 4409319. Any suggestions, please? Thank you.

                Comment


                • #9
                  To those who find this topic while searching for help with rdrobust, note that (as of this writing) the output of
                  Code:
                  search rdrobust
                  includes
                  Code:
                  ...
                  5 packages found (Stata Journal and STB listed first)
                  -----------------------------------------------------
                  
                  st0366_1 from http://www.stata-journal.com/software/sj17-2
                      SJ17-2 st0366_1. Update: Local polynomial... / Update: Local polynomial
                      regression-discontinuity / estimation with robust bias-corrected
                      confidence / intervals and inference procedures / by Sebastian Calonico,
                      University of Miami, / Miami, FL / Matias D. Cattaneo, University of
                  
                  st0366 from http://www.stata-journal.com/software/sj14-4
                      SJ14-4 st0366. Robust data-driven inference... / Robust data-driven
                      inference in the regression- / discontinuity design / by Sebastian
                      Calonico, University of Miami, / Coral Gables, FL / Matias D. Cattaneo,
                      University of Michigan, / Ann Arbor, MI / Rocio Titiunik, University of
                  
                  rdpermute from http://fmwww.bc.edu/RePEc/bocode/r
                      'RDPERMUTE': module to perform a permutation test for the Regression Kink
                      (RK) and Regression Discontinuity (RD) Design / rdpermute implements a
                      permutation test for the Regression Kink / (RK) and Regression
                      Discontinuity (RD) Design for the one / dimensional case of one Outcome
                  
                  rdrobust from http://fmwww.bc.edu/RePEc/bocode/r
                      'RDROBUST': module to provide robust data-driven inference in the
                      regression-discontinuity design / rdrobust implements local polynomial
                      Regression Discontinuity / (RD) point estimators with robust
                      bias-corrected / confidence intervals and inference procedures developed
                  
                  rdrobust from https://raw.githubusercontent.com/rdpackages/rdrobust/master/stata
                      STATA Package: RDROBUST / / Authors: Sebastian Calonico, Department of
                      Health Policy and Management, Columbia University,
                      [email protected] / Matias D. Cattaneo, Operations Research
                      and Financial Engineering, Princeton University, [email protected] /
                  ...
                  of which the final one, maintained on github by the authors, is the most recent.
                  Code:
                  . net install rdrobust, from(https://raw.githubusercontent.com/rdpackages/rdrobust/master/stata)
                  checking rdrobust consistency and verifying not already installed...
                  installing into /Users/lisowskiw/Library/Application Support/Stata/ado/plus/...
                  installation complete.
                  
                  . which rdrobust
                  /Users/lisowskiw/Library/Application Support/Stata/ado/plus/r/rdrobust.ado
                  *!version 8.0.4  2020-08-22

                  Comment


                  • #10
                    This definitely seems like a bug. Being a fairly recent user of rdrobust, such a large number of observations.

                    Comment

                    Working...
                    X