McCrary's DCdensity test in Stata, graph

Fred ReyBellet

Join Date: Jun 2016

Posts: 6
#1

McCrary's DCdensity test in Stata, graph

23 Jun 2016, 09:55

Dear all,

I am not very experienced with Stata and this is my first post here. Thanks in advance for your help.

I found bunching just below financially relevant thresholds in birthweights reported by hospitals and I would like to graphically show the evidence by implementing a McCrary-Test in Stata (McCrary (2008); http://eml.berkeley.edu/~jmccrary/mc..._DCdensity.pdf). I use the user-written command DCdensity available online (http://eml.berkeley.edu/~jmccrary/DCdensity/).
The baby's birthweight is my running variable and I try to see if there is systematic sorting just below the cut-off values that yield a higher reimbursement to the hospitals (e.g. 1000 gram). My code is therefore:

Code:

DCdensity birthweight, breakpoint(1000) generate(Xj Yj r0 fhat se_fhat) b(10) h(100)

1000 is the point of potential discontinuity, 10 is the bin size and 100 is the bandwidth.
But I get the entire birthweight distribution and I can't see anything around the threshold... (see first image below)
If I restrict to a smaller bracket of values, like:

Code:

DCdensity birthweight if birthweight > 750 & birthweight < 1250, breakpoint(1000) generate(Xj Yj r0 fhat se_fhat) b(10) h(100)

I get a weird distribution (see second image below) that shrinks around 1200g instead of growing (birthweight distribution peaks at 3200g..), I think that the command assumes a Normal one but we are in reality in the left-hand tail.

Do you see a solution for me, is it possible e.g. to "zoom" on the part of the graph that interests me? Or to specify in the command that I only want to see the graph for a part of the distribution?

Thanks a lot,

Fred Rey-Bellet
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

23 Jun 2016, 13:27

Welcome to Statalist!

Thank you for providing an excellent statement of the problem that follows the guidelines in the FAQ you read.

The problem, in a nutshell, is that it is not possible to instruct the DCdensity command to restrict the plot to a smaller region. And when you try, as you did, to restrict the the DCdensity command to data in the area of interest, the density it fits in that region isn't the same as the density for that region from the full set of data.

Here's an approach that might work for you, but will be a little challenging to a new user. If I understand DCdensity.ado correctly, it seems to me that it leaves the results in memory. So there's nothing to prevent you from copying the code that produces the graph into the do-file editor and beating on it to do what you want before running it from within the do-file editor window. Below is a start.

Caveat: I am not very experienced with Stata graphics and have not tested this code. If you encounter problems, let us know. With explanations as thorough as the one you provided above, I expect Statalist will be able to help.

Code:

local breakpoint 1000 local cellmpname Xj local cellvalname Yj local evalname r0 local cellsmname fhat local cellsmsename se_fhat tempvar hi quietly gen `hi' = `cellsmname' + 1.96*`cellsmsename' tempvar lo quietly gen `lo' = `cellsmname' - 1.96*`cellsmsename' gr twoway (scatter `cellvalname' `cellmpname' if birthweight > 750 & birthweight < 1250, msymbol(circle_hollow) mcolor(gray)) /// (line `cellsmname' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(medthick)) /// (line `cellsmname' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(medthick)) /// (line `hi' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin)) /// (line `lo' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin)) /// (line `hi' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin)) /// (line `lo' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin)), /// xline(`breakpoint', lcolor(black)) legend(off)
Comment
Fred ReyBellet

Join Date: Jun 2016

Posts: 6
#3

24 Jun 2016, 03:09

Dear William Lisowski,

Thanks a lot for your answer (and your kind compliments)!

Unfortunately, your code gives the first graph below, so still the entire 'bell curve' but with less points.
I have tried many things, e.g. with 'xscale(range(750, 1250)) or like in the code below imposing some arbitrary restrictions on these variables 'cellvalname' and 'cellmpname' that we scatter (I don't know what these variables are by the way):

Code:

gr twoway (scatter `cellvalname' `cellmpname' if `cellmpname' < 2000 & `cellvalname' < .000200

But I always get the entire distribution (second graph below).

It is very naive but isn't it possible to compute the whole distribution (like the graph #1 in my first post) and then to do a mega-zoom on the tiny part of the graph I am interested in?

Thanks a lot again,

Frédéric Rey-Bellet
Comment
Fred ReyBellet

Join Date: Jun 2016

Posts: 6
#4

24 Jun 2016, 03:11

(sorry misuse I got my reply posted twice and I don't find how to delete it)

Last edited by Fred ReyBellet; 24 Jun 2016, 03:21.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

24 Jun 2016, 09:17

I never have any luck deleting posts either.

We've actually accomplished a lot. We've shown that we can copy the graph code out of DCdensity and reproduce the plot. My attempt at limiting the range of the plot worked partially - it limited the range of the scatterplot, but not of the lines. Possibly if we copied the if clause to each of the line subcommands, it would work. But instead let's try something less subtle - we'll drop all the observations that we don't want in the plot. Note that I've removed the if clause that I'd added to the scatter subcommand.

I should add that I dug further into the syntax of the graph command and its subcommands this morning, specifically options for the axes, and I did not see a way to reduce the range of the axis - only to expand it. So the only way to limit range plotted is to be sure no data are plotted that are outside the range you want. Adding if to each subcommand does that; so does brute force.

Code:

local breakpoint 1000
local cellmpname Xj
local cellvalname Yj
local evalname r0
local cellsmname fhat
local cellsmsename se_fhat
drop if `cellmpname' < 750 | `cellmpname' > 1250
drop if `evalname' < 750 | `evalname' > 1250
tempvar hi
quietly gen `hi' = `cellsmname' + 1.96*`cellsmsename'
tempvar lo
quietly gen `lo' = `cellsmname' - 1.96*`cellsmsename'
gr twoway (scatter `cellvalname' `cellmpname', msymbol(circle_hollow) mcolor(gray))             ///
  (line `cellsmname' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(medthick))   ///
  (line `cellsmname' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(medthick))   ///
  (line `hi' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin))              ///
  (line `lo' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin))              ///
  (line `hi' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin))              ///
  (line `lo' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin)),             ///
  xline(`breakpoint', lcolor(black)) legend(off)

Comment

Fred ReyBellet

Join Date: Jun 2016

Posts: 6
#6

24 Jun 2016, 09:55

Dear William Lisowski,

Thank you so much, it works perfectly fine (as you can see below)! You are a genius. I am very happy with my first experience with Statalist!

Have a nice day

Fred.
Comment
Athur Frost

Join Date: Nov 2016

Posts: 1
#7

09 Nov 2016, 04:48

Dear all,

I'm struggling with the exact same problem as stated by Fred ReyBellet , unfortunately I do not really get what is ment by cellmpname, cellvalname, evalname, cellsmname and cellsmsename.
It would be really fantastic if someone could help me out and explain. ;-)

Thanks a lot

Athur Frost
Comment
Frederic Closset

Join Date: Jun 2017

Posts: 2
#8

29 Jun 2017, 10:11

Dear all,

I am new to Statalist and hope that you can help me with these 2 tricky questions.

I would like to graphically depict that there is no bunching around accounting related employee thresholds in my dataset. To do this, I also want to implement the McCrary-test via the user-written command DCdensity.

The number of employees is my running variable and the accounting threshold lies at 50. Therefore, my code is:

Code:

DCdensity employees, breakpoint(50) generate(Xj Yj r0 fhat se_fhat)

Graphical results are displayed in the first image below. In order to zoom in around the threshold, I use the code provided above by William Lisowski. (Many thanks!)

Code:

local breakpoint 50 local cellmpname Xj local cellvalname Yj local evalname r0 local cellsmname fhat local cellsmsename se_fhat drop if `cellmpname' < 0 | `cellmpname' > 500 drop if `evalname' < 0 | `evalname' > 500 tempvar hi quietly gen `hi' = `cellsmname' + 1.96*`cellsmsename' tempvar lo quietly gen `lo' = `cellsmname' - 1.96*`cellsmsename' gr twoway (scatter `cellvalname' `cellmpname', msymbol(circle_hollow) mcolor(gray)) (line `cellsmname' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(medthick)) (line `cellsmname' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(medthick)) (line `hi' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin)) (line `lo' `evalname' if `evalname' < `breakpoint', lcolor(black) lwidth(vthin)) (line `hi' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin)) (line `lo' `evalname' if `evalname' > `breakpoint', lcolor(black) lwidth(vthin)), xline(`breakpoint', lcolor(black)) legend(off)

Graphical results are displayed in the second image below.

Now to my questions:
Graph 1: Why are there bins with a negative employee value (no negative employee values in the sample - I checked)? Those bins distort the graph in such way, that it seems there is a discontinuity even though there is none.

Graph 2: Why is the graph partially gone? The same questions holds for the scatters?

Potential explanations - to be discussed:
There are too feew datapoints left of the threshold, i.e., between 0 and 50, for DCdensity to draw the graph.

r0 seems to be offset vs. Xj by about 300 (employees) - see screenshot.

I am looking forward to the discussion. Thanks a lot!

Best,

Frédéric
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35207
#9

03 Jul 2017, 10:29

#8 cross-posted https://stackoverflow.com/questions/...cdensity-graph
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

04 Jul 2017, 15:02

I do notice that the previous examples in this thread deal with a symmetric distribution for the "running variable", whereas yours is extremely asymmetric and bounded below by 0. Perhaps in these circumstances the methodology breaks down, assumptions are violated, or the code simply was not designed well for it. Perhaps a log transformation of the "running variable" would help.

Failing that, since DCdensity is user-supplied code, you might try addressing your question #1 regarding the graph produced by DCdensity to the source of the code at the email address given in http://eml.berkeley.edu/~jmccrary/.
Comment
Frederic Closset

Join Date: Jun 2017

Posts: 2
#11

27 Jul 2017, 03:25

Dear William,

thank you for your answer. I also believe that my data must, to some extent, violate the codes assumptions. However, I did not come up with the idea to contact Justin McCrary directly. Thanks for the suggestion!

Best,

Frédéric
Comment
Ummey Honey

Join Date: Sep 2023

Posts: 1
#12

26 Sep 2023, 11:24

Hi all, I am experiencing the same problem as Frederic did with the user written dcdensity command. As you can see in the above graph, I have restricted the sample between 0.7 to 0.9 values for my running variables, but the graph is showing some observation left and right side of the range. I checked the data set and it doesnot have any observations with running variable value below 0.7 or above 0.9. If anyone has any idea please help!
Comment

Cat Santos

Join Date: Dec 2019
Posts: 62

#13

17 Mar 2024, 11:21

Hello,

I am trying to use DCdensity to run a McCrary test. However I don't know how should I determine the bin size and the bandwith. How can I know that? The default options produce a very strange graph.

Code:

DCdensity yearat15, breakpoint(1944) generate(Xj Yj r0 fhat se_fhat)

Using default bin size calculation, bin size = .303849338
Using default bandwidth calculation, bandwidth = 20.1630546

Discontinuity estimate (log difference in height): .374787564
                                                   (.057035954)
Performing LLR smoothing.
304 iterations will be performed 
..............................

Click image for larger version

Name: mccr.jpg
Views: 1
Size: 228.9 KB
ID: 1746994

Thank you!

Announcement