Matching by a set of Characteristics?

Brian Lavong

Join Date: Jul 2018

Posts: 5
#1

Matching by a set of Characteristics?

26 Jul 2018, 15:13

Hi all,

I'm conducting some research into the impacts a program has on the outcomes of students within a school district. To this end, I have been instructed to compile a dataset for all the schools in said district across a series of years with characteristics such as enrollment, demographics of teachers and students, and percent low income students. I was told we would use this dataset to do a matching of schools by characteristics and I've lost contact with my mentor. I've spent the last few days reading about matching and I understand it at the theoretical level but have only came across examples of Propensity Score Matching, which I don't think would apply to my case currently.

My question, which this may not be the best forum for this, is whether there is a way to sort of preemptively match schools based on a set of characteristics. It is my understanding that a matching of the schools generally would suffice for the future analyses which will be done. On another note, if this type of matching is not feasible, suggestions of other analyses which can be done with this sort of panel data would be much appreciated.

Many thanks!
Tags: analyses, data, matching
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#2

27 Jul 2018, 02:45

I'd recommend the user-written module -calipmatch-, which is available at ssc.
1 like
Comment
Brian Lavong

Join Date: Jul 2018

Posts: 5
#3

27 Jul 2018, 16:40

Thank you for the response Mike!

Installed the module and noted that this too requires a binary treatment or control. I'm going to try and set them all as control and see if that works.

Cheers
Comment
Brian Lavong

Join Date: Jul 2018

Posts: 5
#4

27 Jul 2018, 17:22

I tried assigning a dummy variable and making them all "controls". As I thought, case observations are required.

If I were to randomly assign "treatment" to say 5 or 10% of the schools and proceed to do a matching would this satisfy my needs if my desires are to just have a resulting set of schools which are similar in pre-determined characteristics such as enrollment and racial breakdown? The way treatments are assigned are known to be quasi-random rotating by year, resulting in <10% of schools participating/year, with >10 years before repeating. I would think that as treatments are quasi-random, this type of approach seems plausible.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#5

28 Jul 2018, 06:42

Here's a thought: duplicate your file with -expand 2, gen(dupe)- Then, make "dupe ==1 vs. 0" as your treatment. You can drop the self to self matches later, or you can prevent self to self matches by creating a fake variable that differs a lot within id so as to prevent such matches.
Comment
Brian Lavong

Join Date: Jul 2018

Posts: 5
#6

29 Jul 2018, 04:29

Ah, thats a more sleek way of doing that, never came across my mind. This should fit the bill quite nicely and addresses the ultimate goal more directly. I'm aware of drop if commands, but I can't think of a way to input a command to drop self matches, would you happen to have the code for that?
Comment

Mike Lacy

Join Date: Apr 2014
Posts: 2416

29 Jul 2018, 08:40

Code:

// Make example data.
clear
set seed 438746
// fake data
set obs 100
gen id = _n
gen float xfloat = runiform()
gen int xint = ceil(runiform() * 3)
// end example data
//
expand 2, gen(dupe)
// In this example, I chose to force an exact match on xint, and a caliper match
// of +/- 0.3 on xfloat.  The maxmatches of 3 is also an illustrative choice.
calipmatch, gen(matchgroup) casevar(dupe) maxmatches(3) calipermatch(xfloat) ///
   caliperwidth(0.3) exactmatch(xint)
// Mark the self-self duplicates.
duplicates tag matchgroup id, gen(selfself)
// Visualize the duplicates if you like
sort matchgroup id
list if (selfself ==1)
// Get rid of the self-self matches
drop if (selfself ==1) & (dupe == 1)

Comment

Brian Lavong

Join Date: Jul 2018

Posts: 5
#8

30 Jul 2018, 08:51

Thank you for the code Mike! I've been implementing it with my data set and been messing around with it and have something neat to put onto my poster now.
Comment

Announcement

Matching by a set of Characteristics?

Comment

Comment

Comment

Comment

Comment

Comment

Comment