yearly matching with Mahapick

Markus Friemler

Join Date: Jul 2018

Posts: 4
#1

yearly matching with Mahapick

23 Jul 2018, 06:08

Dear all,

I´m currently trying to implement a matching mechanism which selects a potential wife or husband from the same sample for an individual which is observed to be married, but has no partner information (partner ID).

In detail, I want to find a pool of potential partners with the help of the mahalanobis distance measure and then assign on of these potential partners randomly.

Am I right, that Mahapick is a good way to implement this approach?

My dataset consists of about 15.000 individuals. For which I want to simulate/impute their complete family situation between 1980 and 2000.

My panel dataset looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long id int year float married double age float education byte region long partner_id 12 1984 1 44 2 2 11 12 1985 1 45 2 2 11 12 1986 1 46 2 2 11 22 1984 0 28 6 5 0 22 1985 1 29 6 5 . 22 1986 . 30 6 5 . 25 1984 1 24 6 . . 25 1985 . 25 6 . . 25 1986 . 26 6 . . end

I want to build the distance measure on age, education and region.

Since I´m running a dynamic microsimulation model, partnering has to be done on a yearly basis. So for example if I simulate that a person marries in the year t, then I need to find a partner for this married individual in t.

So I only need to do the partnering for individuals who are married, but do not have an assigned partner in the specific year.

As far as I´m aware, there is no „if“ command function for Mahapick - right?

So how can I tell Stata to run the matching command only for individuals who are married and do not have an assigned partner in year `y’ ?

Since there is no „if command“, I unfortunately can not run something like:

Code:

forval y= 1980/2000{ gen match_id = . mahapick age education region, idvar(id) treated (married) pickids(match_id) nummatches(5) if year==`y’“ & married==1 & part_id==. }

And my last question is: What would be the "treated" variable here? Is it married, because we only want to match married individuals (with missing partner_id)?

Any help would be appreciated. Thank you very much in advance.
Tags: None
Niklas Stark

Join Date: Jun 2018

Posts: 12
#2

24 Jul 2018, 08:02

I´m facing pretty similiar problems at the moment and would therefore be very interested in your solutions or suggestions of other Statalist users
Comment

David Ray McCoy

Join Date: Dec 2016
Posts: 24

19 Sep 2023, 11:26

In case anyone still runs into this, as I did while looking for something else, I found a workaround to the same problem while using psmatch2 (similar, I assume) with exact year and an exact earnings circumstance, then distance matching within the subset. The difference is that I did it for several treatment groups (agencies), so there is an extra steps you won't need to do. These steps and the code can be adapted:

Iterate through the values of my exact match variables, use preserve, subset the entire data set using the exact-match criteria, perform Mahalanobis matching among the subset of exact matches, save the results by appending to a data set on disk, I used the iteration values to build a variable tracking which exact-match group each match/matched observation belongs to, then restore the data set and iterate through the subsequent exact match criteria.

Mahalanobis matching works well in this situation, because it is not dependent on parametric estimates from logistic regression, like propensity score matching. With the latter, I either had to get a single propensity score for all groups, then use the imprecise within-exact-match scores on the subsets. It did not perform well. But Mahalanobis works reasonably well with the smaller subsamples.

Code:

foreach num of num 1 2 3 {
    use "$temp_dir\matches", clear
    * Subset to agency or potential matches in control
    keep if agency_num == `num' | treatment == 0
    * This will hold the indicator for matched treatment in sample
    gen matched = .
    replace matched = 0 if treatment == 1
* Within each agency, iterate through each year 
    foreach y of num 2006/2022 {
        * Within each agency-year iterate through two values of econ_factor
            foreach dip of num 0 1{
                * Must be able to return to this point within agency (each year)
                preserve
                * Within temp data set, subset to desired treated and potential matches
                * agency_num only exists for treated at this stage.
                keep if (treatment == 0 & startyear == `y' & econ_factor== `dip') | (agency_num == `num' & startyear == `y' & econ_factor== `dip')
                if `num' == 1{
                capture psmatch2 treatment if drop_me != 1 ,  mahalanobis(X1 X2 Xn) 
                }                if `num' == 2{
                capture psmatch2 treatment if drop_me != 1 ,  mahalanobis(X1 X2 Xn) 
                }                if `num' == 3{
                capture psmatch2 treatment if drop_me != 1 ,  mahalanobis(X1 X2 Xn) 
                }
                * match marks a selected control case for match
                gen match = .
                * _weight is 1 or greater for selected control cases
                replace match = 1 if _weight > 0 & _weight != . & treatment == 0
                * treated cases that are matched have _nn == 1
                replace matched = 1 if _nn == 1
                * keep only treated cases in "matched"
                replace matched = . if treatment == 0                
                * Subset to matched treatment and control in temp data
                drop if match == . & matched == .
                * This can identify unique batches of matches if needed later.
                gen match_sample = `num'`y'`dip'
                tab match_sample
                sum match matched
                * Identify agency to which the control matches are matched.
                gen agency_match_num = `num'
                * save to a unique temp file
                save "$temp_dir\temp`num'`y'`dip'.dta", replace
                sleep 1000
                * load the final destination of matching samples
                use "$temp_dir\match_observations_v5.dta", replace
                * add each new sample to the final destination
                append using "$temp_dir\DM_nn_match\temp`num'`y'`dip'.dta"
                save "$temp_dir\match_observations_v5.dta", replace
                di "********************************************"
                di " saved agency `num'; year `y'; dip = `dip'"
                di "********************************************"
                * delete the unneccessary temp file
                erase "$temp_dir\temp`num'`y'`dip'.dta"
                * return from subetted data to base data for next iteration.
                restore
    }
    }
    }

Announcement

yearly matching with Mahapick

Comment

Comment