Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • McCrary's DCdensity test in Stata, graph

    Dear all,

    I am not very experienced with Stata and this is my first post here. Thanks in advance for your help.

    I found bunching just below financially relevant thresholds in birthweights reported by hospitals and I would like to graphically show the evidence by implementing a McCrary-Test in Stata (McCrary (2008); http://eml.berkeley.edu/~jmccrary/mc..._DCdensity.pdf). I use the user-written command DCdensity available online (http://eml.berkeley.edu/~jmccrary/DCdensity/).
    The baby's birthweight is my running variable and I try to see if there is systematic sorting just below the cut-off values that yield a higher reimbursement to the hospitals (e.g. 1000 gram). My code is therefore:
    Code:
    DCdensity birthweight, breakpoint(1000) generate(Xj Yj r0 fhat se_fhat) b(10) h(100)
    1000 is the point of potential discontinuity, 10 is the bin size and 100 is the bandwidth.
    But I get the entire birthweight distribution and I can't see anything around the threshold... (see first image below)
    If I restrict to a smaller bracket of values, like:
    Code:
    DCdensity birthweight if birthweight > 750 & birthweight < 1250, breakpoint(1000) generate(Xj Yj r0 fhat se_fhat) b(10) h(100)
    I get a weird distribution (see second image below) that shrinks around 1200g instead of growing (birthweight distribution peaks at 3200g..), I think that the command assumes a Normal one but we are in reality in the left-hand tail.

    Do you see a solution for me, is it possible e.g. to "zoom" on the part of the graph that interests me? Or to specify in the command that I only want to see the graph for a part of the distribution?

    Thanks a lot,

    Fred Rey-Bellet

    Click image for larger version

Name:	Entire distribution.png
Views:	1
Size:	15.0 KB
ID:	1346484
    Click image for larger version

Name:	Graph with restricted sample.png
Views:	1
Size:	13.6 KB
ID:	1346485

  • #2
    Welcome to Statalist!

    Thank you for providing an excellent statement of the problem that follows the guidelines in the FAQ you read.

    The problem, in a nutshell, is that it is not possible to instruct the DCdensity command to restrict the plot to a smaller region. And when you try, as you did, to restrict the the DCdensity command to data in the area of interest, the density it fits in that region isn't the same as the density for that region from the full set of data.

    Here's an approach that might work for you, but will be a little challenging to a new user. If I understand DCdensity.ado correctly, it seems to me that it leaves the results in memory. So there's nothing to prevent you from copying the code that produces the graph into the do-file editor and beating on it to do what you want before running it from within the do-file editor window. Below is a start.

    Caveat: I am not very experienced with Stata graphics and have not tested this code. If you encounter problems, let us know. With explanations as thorough as the one you provided above, I expect Statalist will be able to help.

    Code:
        local breakpoint 1000
        local cellmpname Xj
        local cellvalname Yj
        local evalname r0
        local cellsmname fhat
        local cellsmsename se_fhat
        tempvar hi
        quietly gen `hi' = `cellsmname' + 1.96*`cellsmsename'
        tempvar lo
        quietly gen `lo' = `cellsmname' - 1.96*`cellsmsename'
        gr twoway (scatter `cellvalname' `cellmpname' if birthweight > 750 & birthweight < 1250, msymbol(circle_hollow) mcolor(gray))   ///
          (line `cellsmname' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(medthick))   ///
          (line `cellsmname' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(medthick))   ///
          (line `hi' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin))              ///
          (line `lo' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin))              ///
          (line `hi' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin))              ///
          (line `lo' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin)),             ///
          xline(`breakpoint', lcolor(black)) legend(off)

    Comment


    • #3
      Dear William Lisowski,

      Thanks a lot for your answer (and your kind compliments)!

      Unfortunately, your code gives the first graph below, so still the entire 'bell curve' but with less points.
      I have tried many things, e.g. with 'xscale(range(750, 1250)) or like in the code below imposing some arbitrary restrictions on these variables 'cellvalname' and 'cellmpname' that we scatter (I don't know what these variables are by the way):
      Code:
      gr twoway (scatter `cellvalname' `cellmpname' if `cellmpname' < 2000 & `cellvalname' < .000200
      But I always get the entire distribution (second graph below).

      It is very naive but isn't it possible to compute the whole distribution (like the graph #1 in my first post) and then to do a mega-zoom on the tiny part of the graph I am interested in?

      Thanks a lot again,

      Frédéric Rey-Bellet

      Click image for larger version

Name:	Graph with restriction in graph command.png
Views:	1
Size:	11.5 KB
ID:	1346606
      Click image for larger version

Name:	Graph arbitrary restriction on cellmpname and cellvalname.png
Views:	1
Size:	13.0 KB
ID:	1346607

      Comment


      • #4
        (sorry misuse I got my reply posted twice and I don't find how to delete it)
        Last edited by Fred ReyBellet; 24 Jun 2016, 03:21.

        Comment


        • #5
          I never have any luck deleting posts either.

          We've actually accomplished a lot. We've shown that we can copy the graph code out of DCdensity and reproduce the plot. My attempt at limiting the range of the plot worked partially - it limited the range of the scatterplot, but not of the lines. Possibly if we copied the if clause to each of the line subcommands, it would work. But instead let's try something less subtle - we'll drop all the observations that we don't want in the plot. Note that I've removed the if clause that I'd added to the scatter subcommand.

          I should add that I dug further into the syntax of the graph command and its subcommands this morning, specifically options for the axes, and I did not see a way to reduce the range of the axis - only to expand it. So the only way to limit range plotted is to be sure no data are plotted that are outside the range you want. Adding if to each subcommand does that; so does brute force.

          Code:
          local breakpoint 1000
          local cellmpname Xj
          local cellvalname Yj
          local evalname r0
          local cellsmname fhat
          local cellsmsename se_fhat
          drop if `cellmpname' < 750 | `cellmpname' > 1250
          drop if `evalname' < 750 | `evalname' > 1250
          tempvar hi
          quietly gen `hi' = `cellsmname' + 1.96*`cellsmsename'
          tempvar lo
          quietly gen `lo' = `cellsmname' - 1.96*`cellsmsename'
          gr twoway (scatter `cellvalname' `cellmpname', msymbol(circle_hollow) mcolor(gray))             ///
            (line `cellsmname' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(medthick))   ///
            (line `cellsmname' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(medthick))   ///
            (line `hi' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin))              ///
            (line `lo' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin))              ///
            (line `hi' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin))              ///
            (line `lo' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin)),             ///
            xline(`breakpoint', lcolor(black)) legend(off)

          Comment


          • #6
            Dear William Lisowski,

            Thank you so much, it works perfectly fine (as you can see below)! You are a genius. I am very happy with my first experience with Statalist!

            Have a nice day

            Fred.

            Click image for larger version

Name:	Graph working.png
Views:	1
Size:	12.1 KB
ID:	1346654

            Comment


            • #7
              Dear all,

              I'm struggling with the exact same problem as stated by Fred ReyBellet , unfortunately I do not really get what is ment by cellmpname, cellvalname, evalname, cellsmname and cellsmsename.
              It would be really fantastic if someone could help me out and explain. ;-)

              Thanks a lot

              Athur Frost

              Comment


              • #8
                Dear all,

                I am new to Statalist and hope that you can help me with these 2 tricky questions.

                I would like to graphically depict that there is no bunching around accounting related employee thresholds in my dataset. To do this, I also want to implement the McCrary-test via the user-written command DCdensity.

                The number of employees is my running variable and the accounting threshold lies at 50. Therefore, my code is:

                Code:
                DCdensity employees, breakpoint(50) generate(Xj Yj r0 fhat se_fhat)
                Graphical results are displayed in the first image below. In order to zoom in around the threshold, I use the code provided above by William Lisowski. (Many thanks!)

                Code:
                local breakpoint 50
                local cellmpname Xj
                local cellvalname Yj
                local evalname r0
                local cellsmname fhat
                local cellsmsename se_fhat
                drop if `cellmpname' < 0 | `cellmpname' > 500
                drop if `evalname' < 0 | `evalname' > 500
                tempvar hi
                quietly gen `hi' = `cellsmname' + 1.96*`cellsmsename'
                tempvar lo
                quietly gen `lo' = `cellsmname' - 1.96*`cellsmsename'
                gr twoway (scatter `cellvalname' `cellmpname', msymbol(circle_hollow) mcolor(gray)) (line `cellsmname' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(medthick)) (line `cellsmname' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(medthick)) (line `hi' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin)) (line `lo' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin)) (line `hi' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin)) (line `lo' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin)), xline(`breakpoint', lcolor(black)) legend(off)
                Graphical results are displayed in the second image below.

                Now to my questions:
                1. Graph 1: Why are there bins with a negative employee value (no negative employee values in the sample - I checked)? Those bins distort the graph in such way, that it seems there is a discontinuity even though there is none.
                2. Graph 2: Why is the graph partially gone? The same questions holds for the scatters?
                Potential explanations - to be discussed:
                1. There are too feew datapoints left of the threshold, i.e., between 0 and 50, for DCdensity to draw the graph.
                2. r0 seems to be offset vs. Xj by about 300 (employees) - see screenshot.
                I am looking forward to the discussion. Thanks a lot!

                Best,

                Frédéric

                Click image for larger version

Name:	Graph_before.png
Views:	1
Size:	17.3 KB
ID:	1399767



                Click image for larger version

Name:	Graph_after.png
Views:	1
Size:	14.1 KB
ID:	1399768



                Click image for larger version

Name:	data.PNG
Views:	1
Size:	36.5 KB
ID:	1399769

                Comment


                • #9
                  #8 cross-posted https://stackoverflow.com/questions/...cdensity-graph

                  Comment


                  • #10
                    I do notice that the previous examples in this thread deal with a symmetric distribution for the "running variable", whereas yours is extremely asymmetric and bounded below by 0. Perhaps in these circumstances the methodology breaks down, assumptions are violated, or the code simply was not designed well for it. Perhaps a log transformation of the "running variable" would help.

                    Failing that, since DCdensity is user-supplied code, you might try addressing your question #1 regarding the graph produced by DCdensity to the source of the code at the email address given in http://eml.berkeley.edu/~jmccrary/.

                    Comment


                    • #11
                      Dear William,

                      thank you for your answer. I also believe that my data must, to some extent, violate the codes assumptions. However, I did not come up with the idea to contact Justin McCrary directly. Thanks for the suggestion!

                      Best,

                      Frédéric

                      Comment


                      • #12
                        Click image for larger version

Name:	Graph.png
Views:	1
Size:	62.5 KB
ID:	1728259

                        Hi all, I am experiencing the same problem as Frederic did with the user written dcdensity command. As you can see in the above graph, I have restricted the sample between 0.7 to 0.9 values for my running variables, but the graph is showing some observation left and right side of the range. I checked the data set and it doesnot have any observations with running variable value below 0.7 or above 0.9. If anyone has any idea please help!


                        Comment


                        • #13
                          Hello,

                          I am trying to use DCdensity to run a McCrary test. However I don't know how should I determine the bin size and the bandwith. How can I know that? The default options produce a very strange graph.

                          Code:
                          DCdensity yearat15, breakpoint(1944) generate(Xj Yj r0 fhat se_fhat)
                          
                          Using default bin size calculation, bin size = .303849338
                          Using default bandwidth calculation, bandwidth = 20.1630546
                          
                          Discontinuity estimate (log difference in height): .374787564
                                                                             (.057035954)
                          Performing LLR smoothing.
                          304 iterations will be performed 
                          ..............................
                          Click image for larger version

Name:	mccr.jpg
Views:	1
Size:	228.9 KB
ID:	1746994



                          Thank you!

                          Comment

                          Working...
                          X