Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimal Matching vs Propensity Score Matching

    Hello Everyone! I am new to STATA and I am trying to run sample matching. Recently a friend of mine told me about the optimal matching command in STATA
    the code is : optmatch2 new_rank_dum3 size, min(1) max(2) gen(matching)

    The above code works perfectly fine! However, I am a bit confused about the difference between optimal matching and Propensity score matching..
    I know that PSM is the more widely used technique.. I was looking up the internet to see what the difference is but I wasn't able to find good information on optimal matching..

    Does anyone know the difference or have any good references that I can review?? Thank you so much!!

  • #2
    Hi Paul,

    My understanding is that propensity score matching encompasses several techniques, including nearest neighbour or "pair" matching (the most common and therefore usually just referred to as PSM), with or without replacement, malanobhis(sp?), radius, with caliper, full, optimal.

    Optimal is a halfway point between full and pair matching, which prioritise closeness/distance in spread of covariates/treatment groups, respectively. See
    HTML Code:
    https://www.tandfonline.com/doi/abs/10.1198/106186006X137047
    This recent review has helped my understanding:
    HTML Code:
    https://www.journals.uchicago.edu/doi/10.1086/711393
    Kind regards,
    Hannah
    Last edited by Hannah Beilby; 17 Jun 2024, 23:29.

    Comment


    • #3
      Paul,
      I am not familiar with the user contributed program optmatch2, and the posted documentation for its algorithms are not very explicit.
      Propensity score matching (PSM) is used to establish weights on covariates to better match controls to treatments, specifically through a first stage logit or probit model. For example, say you want to investigate the efficacy of a medical treatment. Simply comparing the outcomes of those who received the treatment to the general population is not appropriate as everyone in the general population may not have had the same medical needs as those who received the treatment. Perhaps you have 10 variables that can describe the health of those who received the treatment (age, sex, bmi, race, etc.). Ideally, you would want to compare the outcomes of those who actually received the treatment to a sample of otherwise identical people. PSM would include these 10 variables into a logit or probit model to predict the likelihood of someone receiving a treatment, then use the coefficients from each of the variables to assign a singular propensity score to those that did not receive the treatment. PSM then matches each person in the treatment to someone (or multiple people) who had the most similar propensity score that was not treated for comparison. PSM is a good tool when there are many potential variables that could lead to being included in a treatment as it can handle a lot of variables in a simple logit model. or when some of the characteristics are more important than others (perhaps age is much more important than sex) as it weighs each covariate based on its logit coefficient. However, if the first stage logit model does a poor job of predicting the treatment, then the outcome will resemble a random matching. Therefore, it is essential to evaluate the sample balance between the treatment and control groups, I use standardized differences and variance ratios (this is a good read: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2943670/). PSM also has issues with exact matching, perhaps you want to ensure that every male that received the treatment was compared to a male that did not. By default, PSM cannot handle this, so subroutines or other data manipulation must be utilized to accomplish exact matching.

      see 'psmatch2' or 'teffects psmatch' for common Stata PSM algorithms.

      As an alternative, nearest neighbor matching (NNM) may be more appropriate when there are a small number of covariates that are all theoretically relevant. NNM uses a smaller set of matching criteria and suffers from the curse of dimensionality as each additional covariate used to match decreases the weight of the other covariates. There are a variety of ways that the controls are matched to the treatments such as Mahalanobis distances, Euclidean distances, etc. Just like PSM, you must still ensure that the treatment and control groups are balanced; i.e., they are comprised of sufficiently similar entities based on the criteria most important. Exact matching on certain criteria it typically much more straightforward with NNM than PSM.

      see 'teffects nnmatch' for Stata

      With both PSM and NNM, there are options for how many control entities to match to each treatment entity as well as whether the control entities can be used for more than one treatment entity (replacement). Also, caliper restrictions can be utilized to exclude treatment entities from the analysis that do not have sufficiently similar controls.

      Whereas both of the above matching criteria are based on ensuring that the two sample populations are similar, sometimes a more fine-grained approach is used where each treatment entity is matched to a control entity based on percentage or absolute bounds on each. For example, each control needs to be within 2 years of age of the treatment, or within 5% of the body mass index. This type of matching may yield different numbers of controls for each treatment, so there are some decisions to be made on how to manage this. I have not encountered a canned Stata program for this type of matching, I typically export data to Excel for the matching and then integrate the matches back to Stata for analysis. This process can become very lengthy when there are many treatment entities.

      Also, there is coarsened exact matching (CEM) as an option which is kinda like a combination of NNM and the more fine-grained approach just mentioned. It essentially creates bins for each of the covariates (ages 30-35 in one bin and 35-40 in another, etc,) then matches controls to treatments based on being the same bin for each of the criteria.

      see 'cem' for Stata

      Comment


      • #4
        Ha, I just realized this was created 4 years ago...

        Comment

        Working...
        X