Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • yearly matching with Mahapick

    Dear all,

    I´m currently trying to implement a matching mechanism which selects a potential wife or husband from the same sample for an individual which is observed to be married, but has no partner information (partner ID).

    In detail, I want to find a pool of potential partners with the help of the mahalanobis distance measure and then assign on of these potential partners randomly.

    Am I right, that Mahapick is a good way to implement this approach?

    My dataset consists of about 15.000 individuals. For which I want to simulate/impute their complete family situation between 1980 and 2000.

    My panel dataset looks like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id int year float married double age float education byte region long partner_id
    12 1984 1 44 2 2 11
    12 1985 1 45 2 2 11
    12 1986 1 46 2 2 11
    22 1984 0 28 6 5   0
    22 1985 1 29 6 5   .
    22 1986 . 30 6 5   .
    25 1984 1 24 6 . .
    25 1985 . 25 6 . .
    25 1986 . 26 6 . .
    end
    I want to build the distance measure on age, education and region.

    Since I´m running a dynamic microsimulation model, partnering has to be done on a yearly basis. So for example if I simulate that a person marries in the year t, then I need to find a partner for this married individual in t.

    So I only need to do the partnering for individuals who are married, but do not have an assigned partner in the specific year.

    As far as I´m aware, there is no „if“ command function for Mahapick - right?

    So how can I tell Stata to run the matching command only for individuals who are married and do not have an assigned partner in year `y’ ?

    Since there is no „if command“, I unfortunately can not run something like:

    Code:
    forval y= 1980/2000{
    
    gen match_id = .
    
    mahapick age education region, idvar(id) treated (married) pickids(match_id) nummatches(5) if year==`y’“ & married==1 & part_id==.
            }
    And my last question is: What would be the "treated" variable here? Is it married, because we only want to match married individuals (with missing partner_id)?

    Any help would be appreciated. Thank you very much in advance.

  • #2
    I´m facing pretty similiar problems at the moment and would therefore be very interested in your solutions or suggestions of other Statalist users

    Comment


    • #3
      In case anyone still runs into this, as I did while looking for something else, I found a workaround to the same problem while using psmatch2 (similar, I assume) with exact year and an exact earnings circumstance, then distance matching within the subset. The difference is that I did it for several treatment groups (agencies), so there is an extra steps you won't need to do. These steps and the code can be adapted:

      Iterate through the values of my exact match variables, use preserve, subset the entire data set using the exact-match criteria, perform Mahalanobis matching among the subset of exact matches, save the results by appending to a data set on disk, I used the iteration values to build a variable tracking which exact-match group each match/matched observation belongs to, then restore the data set and iterate through the subsequent exact match criteria.

      Mahalanobis matching works well in this situation, because it is not dependent on parametric estimates from logistic regression, like propensity score matching. With the latter, I either had to get a single propensity score for all groups, then use the imprecise within-exact-match scores on the subsets. It did not perform well. But Mahalanobis works reasonably well with the smaller subsamples.
      Code:
      foreach num of num 1 2 3 {
          use "$temp_dir\matches", clear
          * Subset to agency or potential matches in control
          keep if agency_num == `num' | treatment == 0
          * This will hold the indicator for matched treatment in sample
          gen matched = .
          replace matched = 0 if treatment == 1
      * Within each agency, iterate through each year 
          foreach y of num 2006/2022 {
              * Within each agency-year iterate through two values of econ_factor
                  foreach dip of num 0 1{
                      * Must be able to return to this point within agency (each year)
                      preserve
                      * Within temp data set, subset to desired treated and potential matches
                      * agency_num only exists for treated at this stage.
                      keep if (treatment == 0 & startyear == `y' & econ_factor== `dip') | (agency_num == `num' & startyear == `y' & econ_factor== `dip')
                      if `num' == 1{
                      capture psmatch2 treatment if drop_me != 1 ,  mahalanobis(X1 X2 Xn) 
                      }                if `num' == 2{
                      capture psmatch2 treatment if drop_me != 1 ,  mahalanobis(X1 X2 Xn) 
                      }                if `num' == 3{
                      capture psmatch2 treatment if drop_me != 1 ,  mahalanobis(X1 X2 Xn) 
                      }
                      * match marks a selected control case for match
                      gen match = .
                      * _weight is 1 or greater for selected control cases
                      replace match = 1 if _weight > 0 & _weight != . & treatment == 0
                      * treated cases that are matched have _nn == 1
                      replace matched = 1 if _nn == 1
                      * keep only treated cases in "matched"
                      replace matched = . if treatment == 0                
                      * Subset to matched treatment and control in temp data
                      drop if match == . & matched == .
                      * This can identify unique batches of matches if needed later.
                      gen match_sample = `num'`y'`dip'
                      tab match_sample
                      sum match matched
                      * Identify agency to which the control matches are matched.
                      gen agency_match_num = `num'
                      * save to a unique temp file
                      save "$temp_dir\temp`num'`y'`dip'.dta", replace
                      sleep 1000
                      * load the final destination of matching samples
                      use "$temp_dir\match_observations_v5.dta", replace
                      * add each new sample to the final destination
                      append using "$temp_dir\DM_nn_match\temp`num'`y'`dip'.dta"
                      save "$temp_dir\match_observations_v5.dta", replace
                      di "********************************************"
                      di " saved agency `num'; year `y'; dip = `dip'"
                      di "********************************************"
                      * delete the unneccessary temp file
                      erase "$temp_dir\temp`num'`y'`dip'.dta"
                      * return from subetted data to base data for next iteration.
                      restore
          }
          }
          }

      Comment

      Working...
      X