Why does the psmatch2 using 2 nearest neighbors not give a 1:2 result?

Mary-Helen Soyland

Join Date: Jun 2022

Posts: 2
#1

Why does the psmatch2 using 2 nearest neighbors not give a 1:2 result?

30 Jun 2022, 06:26

Hi,

This is my first post to Statalist so please bare with me.
I am currently working with an observational dataset (approx 5000 observations) wanting to look for treatment effect of the treatment tpa_alle on the outcome mrs3mnd_xlnt (binary categorical). I have used the psmatch2 command for propensity score matching based on treatment status and 10 other variables. I have done a 1:1 matching, but also wish to do a 1:2 matching to see if this gives me similar results.
This is where my problem arises.
For the matching with 2 nearest neighbors I used the following code:

Code:

psmatch2 tpa_alle c.patientage gender c.nihssinnkomst i.mrspre logreg_af logreg_prebtrx logreg_priormi tidltia livesalone i.hospital_volume_3_groups if wus_kos_uos==1 & tidlhjerneslag==0, n(2) caliper(0.2) odds logit

When I try to check how many cases in each group after matching it looks to me like it didn't give me a 1:2 match.
I tried

Code:

tab tpa_alle if _weight!=.

which shows 804 cases in the untreated group and 526 treated.
Is this because som controls/untreated cases are used more than once? Other explanations? Other ways of displaying the number of cases in the matched groups?

I also used the following code to identify the matched pairs and tried to use this to look at the no of cases in each group:

Code:

gen pair1 = _id if _treated==0 replace pair1 = _n1 if _treated==1 gen pair2 = _id if _treated==0 replace pair2 = _n2 if _treated==1 bysort pair1: egen paircount1 = count(pair1) bysort pair2: egen paircount2 = count(pair2) egen byte paircount = anycount(paircount1 paircount2), values(2)

Followed by:

Code:

tab tpa_alle if paircount!=0

This however retured 389 treated and 687 untreated and thus left me even more confused as the no of cases in the two group differ depending on apporach when I thought these two options were similar. Any one have any insight on which approach is correct -if any?

Thankful for your suggestions!
Cheers
Mary-Helen
Tags: None

Fei Wang

Join Date: Oct 2021
Posts: 726

30 Jun 2022, 09:22

I'll show an example to explain this point. The example investigates the effect of mother smoking during pregnancy (mbsmoke = 1 for smoking mothers, = 0 for non-smoking mothers). For each smoking mother, I'll find 2 nearest non-smoking mothers.

Code:

webuse cattaneo2, clear
psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, n(2) logit odds caliper(0.2)

Then I tabulate the distribution of _weight by mbsmoke.

Code:

tab _weight mbsmoke

 psmatch2: |
 weight of |
   matched |  1 if mother smoked
  controls | nonsmoker     smoker |     Total
-----------+----------------------+----------
        .5 |       223          0 |       223
         1 |        90        864 |       954
       1.5 |        77          0 |        77
         2 |        57          0 |        57
       2.5 |        21          0 |        21
         3 |        15          0 |        15
       3.5 |        19          0 |        19
         4 |         8          0 |         8
       4.5 |         8          0 |         8
         5 |         6          0 |         6
         6 |         2          0 |         2
       6.5 |         8          0 |         8
         7 |         6          0 |         6
       7.5 |         4          0 |         4
         8 |         2          0 |         2
       9.5 |         2          0 |         2
-----------+----------------------+----------
     Total |       548        864 |     1,412

You can see that the weight of each smoking mother is 1 (864 smoking mothers in total), while the weight differs for the non-smoking mothers because a non-smoker may be used multiple times to match different smokers. If a non-smoker is used once for matching, then its weight is normalized to 0.5; if a non-smoker is used three times, then its weight is 0.5*3 = 1.5.

So if you'd like to confirm the 1:2 ratio, the code can be

Code:

egen n_control = total(_weight*(mbsmoke==0)*2)

sum n_control

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   n_control |      4,642        1728           0       1728       1728

1728: 864 = 2: 1.

Last edited by Fei Wang; 30 Jun 2022, 09:24.

Announcement

Why does the psmatch2 using 2 nearest neighbors not give a 1:2 result?

Comment